Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # dev - DistCpV2 in 0.23


Copy link to this message
-
Re: DistCpV2 in 0.23
Alejandro Abdelnur 2011-08-26, 16:48
And I'll be more than happy to review it from the Mavenization perspective.

Thxs.

Alejandro

On Fri, Aug 26, 2011 at 9:47 AM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote:

> Please, don't add more Mavenization work on us (eventually I want to go
> back to coding)
>
> Given that Hadoop is already Mavenized, the patch should be Mavenized.
>
> What will have to be done extra (besides Mavenizing distcp) is to create a
> hadoop-tools module at root level and within it a hadoop-distcp module.
>
> The hadoop-tools POM will look pretty much like the hadoop-common-project
> POM.
>
> The hadoop-distcp POM should follow the hadoop-common POM patterns.
>
> Thanks.
>
> Alejandro
>
> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> [EMAIL PROTECTED]> wrote:
>
>> Agree with Mithun and Robert. DistCp and Tools restructuring are separate
>> tasks. Since DistCp code is ready to be committed, it need not wait for the
>> Tools separation from MR/HDFS.
>> I would say it can go into contrib as the patch is now, and when the tools
>> restructuring happens it would be just an svn mv.  If there are no issues
>> with this proposal I can commit the code tomorrow.
>>
>> Thanks
>> Amareshwari
>>
>> On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote:
>>
>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>> should not block distcpv2 from going in.  It would be very nice, however, to
>> get the layout settled soon so that we all know where to find something when
>> we want to work on it.
>>
>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>>
>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
>> separate modules right now, there is still tight coupling between the
>> different pieces, especially with tests.  IMO until we can reduce that
>> coupling we should treat building and testing Hadoop as a single project
>> instead of trying to keep them separate.
>>
>> --Bobby
>>
>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
>> [EMAIL PROTECTED]> wrote:
>>
>> Would it be acceptable if retooling of tools/ were taken up separately? It
>> sounds to me like this might be a distinct (albeit related) task.
>>
>> Mithun
>>
>>
>> ________________________________
>> From: Giridharan Kesavan <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Friday, August 26, 2011 12:04 PM
>> Subject: Re: DistCpV2 in 0.23
>>
>> +1 to Alejandro's
>>
>> I prefer to keep the hadoop-tools at trunk level.
>>
>> -Giri
>>
>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>
>> wrote:
>> > I'd suggest putting hadoop-tools either at trunk/ level or having a a
>> tools
>> > aggregator module for hdfs and other for common.
>> >
>> > I personal would prefer at trunk/.
>> >
>> > Thanks.
>> >
>> > Alejandro
>> >
>> > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> Agree. It should be separate maven module (and patch puts it as
>> separate
>> >> maven module now). And top level for hadoop tools is nice to have, but
>> it
>> >> becomes hard to maintain until patch automation tests run the tests
>> under
>> >> tools. Currently we see many times the changes in HDFS effecting RAID
>> tests
>> >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>> >>
>> >> I propose we can have something like the following:
>> >>
>> >> trunk/
>> >>  - hadoop-mapreduce
>> >>      - hadoop-mr-client
>> >>      - hadoop-yarn
>> >>      - hadoop-tools
>> >>          - hadoop-streaming
>> >>          - hadoop-archives
>> >>          - hadoop-distcp
>> >>
>> >> Thoughts?
>> >>
>> >> @Eli and @JD, we did not replace old legacy distcp because this is
>> really a
>> >> complete rewrite and did not want to remove it until users are
>> familiarized
>> >> with new one.
>> >>
>> >> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:
>> >>
>> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go