Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # dev - DistCpV2 in 0.23


Copy link to this message
-
Re: DistCpV2 in 0.23
Eli Collins 2011-08-26, 14:39
On Friday, August 26, 2011, Amareshwari Sri Ramadasu <[EMAIL PROTECTED]>
wrote:
> Agree. It should be separate maven module (and patch puts it as separate
maven module now). And top level for hadoop tools is nice to have, but it
becomes hard to maintain until patch automation tests run the tests under
tools. Currently we see many times the changes in HDFS effecting RAID tests
in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>
> I propose we can have something like the following:
>
> trunk/
>  - hadoop-mapreduce
>      - hadoop-mr-client
>      - hadoop-yarn
>      - hadoop-tools
>          - hadoop-streaming
>          - hadoop-archives
>          - hadoop-distcp
>
> Thoughts?
>
> @Eli and @JD, we did not replace old legacy distcp because this is really
a complete rewrite and did not want to remove it until users are
familiarized with new one.

That makes sense, we have a similar situation w hftp and hoop. The new
distcp shouldn't be contrib is my only input.

Thanks,
Eli
>
> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:
>
> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> in there as well - ie tools that are downstream of MR and/or HDFS.
>
> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]>
wrote:
>> +1 for a seperate module in hadoop-mapreduce-project. I think
>> hadoop-mapreduce-client might not be right place for it. We might have
>> to pick a new maven module under hadoop-mapreduce-project that could
>> host streaming/distcp/hadoop archives.
>>
>> thanks
>> mahadev
>>
>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]>
wrote:
>>> Agree, it should be a separate maven module.
>>>
>>> And it should be under hadoop-mapreduce-client, right?
>>>
>>> And now that we are in the topic, the same should go for streaming, no?
>>>
>>> Thanks.
>>>
>>> Alejandro
>>>
>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>>>
>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> wrote:
>>>> > Nice work!   I definitely think this should go in 23 and 20x.
>>>> >
>>>> > Agree with JD that it should be in the core code, not contrib.  If
>>>> > it's going to be maintained then we should put it in the core code.
>>>>
>>>> Now that we're all mavenized, though, a separate maven module and
>>>> artifact does make sense IMO - ie "hadoop jar
>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>>>
>>>> -Todd
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>