|
Amareshwari Sri Ramadasu
2011-08-29, 08:43
Allen Wittenauer
2011-08-29, 18:40
Amareshwari Sri Ramadasu
2011-08-30, 08:01
Vinod Kumar Vavilapalli
2011-08-30, 12:43
Mithun Radhakrishnan
2011-09-06, 05:28
Amareshwari Sri Ramadasu
2011-09-06, 07:13
Arun C Murthy
2011-09-06, 07:19
Vinod Kumar Vavilapalli
2011-09-06, 16:30
Allen Wittenauer
2011-09-06, 17:11
Eli Collins
2011-09-06, 23:32
Allen Wittenauer
2011-09-06, 23:46
Eric Yang
2011-09-07, 01:38
Alejandro Abdelnur
2011-09-07, 01:55
Vinod Kumar Vavilapalli
2011-09-07, 13:32
Eric Yang
2011-09-07, 17:50
Alejandro Abdelnur
2011-09-07, 18:18
Mahadev Konar
2011-09-07, 18:27
Milind.Bhandarkar@...
2011-09-07, 18:32
Alejandro Abdelnur
2011-09-07, 18:35
Rottinghuis, Joep
2011-09-08, 03:43
Amareshwari Sri Ramadasu
2011-09-08, 04:33
Rottinghuis, Joep
2011-09-09, 05:25
Vinod Kumar Vavilapalli
2011-09-12, 13:47
Alejandro Abdelnur
2011-10-18, 19:41
|
-
Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Amareshwari Sri Ramadasu 2011-08-29, 08:43
Some questions on making hadoop-tools top level under trunk,
1. Should the patches for tools be created against Hadoop Common? 2. What will happen to the tools test automation? Will it run as part of Hadoop Common tests? 3. Will it introduce a dependency from MapReduce to Common? Or is this taken care in Mavenization? Thanks Amareshwari On 8/26/11 10:17 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote: Please, don't add more Mavenization work on us (eventually I want to go back to coding) Given that Hadoop is already Mavenized, the patch should be Mavenized. What will have to be done extra (besides Mavenizing distcp) is to create a hadoop-tools module at root level and within it a hadoop-distcp module. The hadoop-tools POM will look pretty much like the hadoop-common-project POM. The hadoop-distcp POM should follow the hadoop-common POM patterns. Thanks. Alejandro On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < [EMAIL PROTECTED]> wrote: > Agree with Mithun and Robert. DistCp and Tools restructuring are separate > tasks. Since DistCp code is ready to be committed, it need not wait for the > Tools separation from MR/HDFS. > I would say it can go into contrib as the patch is now, and when the tools > restructuring happens it would be just an svn mv. If there are no issues > with this proposal I can commit the code tomorrow. > > Thanks > Amareshwari > > On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote: > > I agree with Mithun. They are related but this goes beyond distcpv2 and > should not block distcpv2 from going in. It would be very nice, however, to > get the layout settled soon so that we all know where to find something when > we want to work on it. > > Also +1 for Alejandro's I also prefer to keep tools at the trunk level. > > Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate > modules right now, there is still tight coupling between the different > pieces, especially with tests. IMO until we can reduce that coupling we > should treat building and testing Hadoop as a single project instead of > trying to keep them separate. > > --Bobby > > On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <[EMAIL PROTECTED]> > wrote: > > Would it be acceptable if retooling of tools/ were taken up separately? It > sounds to me like this might be a distinct (albeit related) task. > > Mithun > > > ________________________________ > From: Giridharan Kesavan <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Friday, August 26, 2011 12:04 PM > Subject: Re: DistCpV2 in 0.23 > > +1 to Alejandro's > > I prefer to keep the hadoop-tools at trunk level. > > -Giri > > On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> > wrote: > > I'd suggest putting hadoop-tools either at trunk/ level or having a a > tools > > aggregator module for hdfs and other for common. > > > > I personal would prefer at trunk/. > > > > Thanks. > > > > Alejandro > > > > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < > > [EMAIL PROTECTED]> wrote: > > > >> Agree. It should be separate maven module (and patch puts it as separate > >> maven module now). And top level for hadoop tools is nice to have, but > it > >> becomes hard to maintain until patch automation tests run the tests > under > >> tools. Currently we see many times the changes in HDFS effecting RAID > tests > >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. > >> > >> I propose we can have something like the following: > >> > >> trunk/ > >> - hadoop-mapreduce > >> - hadoop-mr-client > >> - hadoop-yarn > >> - hadoop-tools > >> - hadoop-streaming > >> - hadoop-archives > >> - hadoop-distcp > >> > >> Thoughts? > >> > >> @Eli and @JD, we did not replace old legacy distcp because this is > really a > >> complete rewrite and did not want to remove it until users are > familiarized > >> with new one. > >> > >> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Allen Wittenauer 2011-08-29, 18:40
I have a feeling this discussion should get moved to common-dev or even to general. My #1 question is if tools is basically contrib reborn. If not, what makes it different? On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: > Some questions on making hadoop-tools top level under trunk, > > 1. Should the patches for tools be created against Hadoop Common? > 2. What will happen to the tools test automation? Will it run as part of Hadoop Common tests? > 3. Will it introduce a dependency from MapReduce to Common? Or is this taken care in Mavenization? > > > Thanks > Amareshwari > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote: > > Please, don't add more Mavenization work on us (eventually I want to go back > to coding) > > Given that Hadoop is already Mavenized, the patch should be Mavenized. > > What will have to be done extra (besides Mavenizing distcp) is to create a > hadoop-tools module at root level and within it a hadoop-distcp module. > > The hadoop-tools POM will look pretty much like the hadoop-common-project > POM. > > The hadoop-distcp POM should follow the hadoop-common POM patterns. > > Thanks. > > Alejandro > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Agree with Mithun and Robert. DistCp and Tools restructuring are separate >> tasks. Since DistCp code is ready to be committed, it need not wait for the >> Tools separation from MR/HDFS. >> I would say it can go into contrib as the patch is now, and when the tools >> restructuring happens it would be just an svn mv. If there are no issues >> with this proposal I can commit the code tomorrow. >> >> Thanks >> Amareshwari >> >> On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote: >> >> I agree with Mithun. They are related but this goes beyond distcpv2 and >> should not block distcpv2 from going in. It would be very nice, however, to >> get the layout settled soon so that we all know where to find something when >> we want to work on it. >> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level. >> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate >> modules right now, there is still tight coupling between the different >> pieces, especially with tests. IMO until we can reduce that coupling we >> should treat building and testing Hadoop as a single project instead of >> trying to keep them separate. >> >> --Bobby >> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <[EMAIL PROTECTED]> >> wrote: >> >> Would it be acceptable if retooling of tools/ were taken up separately? It >> sounds to me like this might be a distinct (albeit related) task. >> >> Mithun >> >> >> ________________________________ >> From: Giridharan Kesavan <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Friday, August 26, 2011 12:04 PM >> Subject: Re: DistCpV2 in 0.23 >> >> +1 to Alejandro's >> >> I prefer to keep the hadoop-tools at trunk level. >> >> -Giri >> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a >> tools >>> aggregator module for hdfs and other for common. >>> >>> I personal would prefer at trunk/. >>> >>> Thanks. >>> >>> Alejandro >>> >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < >>> [EMAIL PROTECTED]> wrote: >>> >>>> Agree. It should be separate maven module (and patch puts it as separate >>>> maven module now). And top level for hadoop tools is nice to have, but >> it >>>> becomes hard to maintain until patch automation tests run the tests >> under >>>> tools. Currently we see many times the changes in HDFS effecting RAID >> tests >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >>>> >>>> I propose we can have something like the following: >>>> >>>> trunk/ >>>> - hadoop-mapreduce >>>> - hadoop-mr-client
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Amareshwari Sri Ramadasu 2011-08-30, 08:01
Copying common-dev.
Summarizing the below discussion: What should be the tools layout after mavenization? Option #1: Have hadoop-tools at top level i.e trunk/ hadoop-tools/ hadoop-distcp/ Pros: Cleaner layout. In future, tools could be released separately from Hadoop releases Cons: Difficult to maintain Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if they are depending on MapReduce/HDFS/Common respectively. For ex: hadoop-mapreduce-project/ hadoop-mr-tools/ hadoop-distcp/ Pros: Easy to maintain Cons: Still has tight coupling with related projects. Personally, I'm fine with any of the above options. Looking for suggestions and reaching a consensus on this. Thanks Amareshwari On 8/30/11 12:10 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: I have a feeling this discussion should get moved to common-dev or even to general. My #1 question is if tools is basically contrib reborn. If not, what makes it different? On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: > Some questions on making hadoop-tools top level under trunk, > > 1. Should the patches for tools be created against Hadoop Common? > 2. What will happen to the tools test automation? Will it run as part of Hadoop Common tests? > 3. Will it introduce a dependency from MapReduce to Common? Or is this taken care in Mavenization? > > > Thanks > Amareshwari > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote: > > Please, don't add more Mavenization work on us (eventually I want to go back > to coding) > > Given that Hadoop is already Mavenized, the patch should be Mavenized. > > What will have to be done extra (besides Mavenizing distcp) is to create a > hadoop-tools module at root level and within it a hadoop-distcp module. > > The hadoop-tools POM will look pretty much like the hadoop-common-project > POM. > > The hadoop-distcp POM should follow the hadoop-common POM patterns. > > Thanks. > > Alejandro > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Agree with Mithun and Robert. DistCp and Tools restructuring are separate >> tasks. Since DistCp code is ready to be committed, it need not wait for the >> Tools separation from MR/HDFS. >> I would say it can go into contrib as the patch is now, and when the tools >> restructuring happens it would be just an svn mv. If there are no issues >> with this proposal I can commit the code tomorrow. >> >> Thanks >> Amareshwari >> >> On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote: >> >> I agree with Mithun. They are related but this goes beyond distcpv2 and >> should not block distcpv2 from going in. It would be very nice, however, to >> get the layout settled soon so that we all know where to find something when >> we want to work on it. >> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level. >> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate >> modules right now, there is still tight coupling between the different >> pieces, especially with tests. IMO until we can reduce that coupling we >> should treat building and testing Hadoop as a single project instead of >> trying to keep them separate. >> >> --Bobby >> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <[EMAIL PROTECTED]> >> wrote: >> >> Would it be acceptable if retooling of tools/ were taken up separately? It >> sounds to me like this might be a distinct (albeit related) task. >> >> Mithun >> >> >> ________________________________ >> From: Giridharan Kesavan <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Friday, August 26, 2011 12:04 PM >> Subject: Re: DistCpV2 in 0.23 >> >> +1 to Alejandro's >> >> I prefer to keep the hadoop-tools at trunk level. >> >> -Giri >> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a >> tools
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Vinod Kumar Vavilapalli 2011-08-30, 12:43
As long as hadoop-tools is in some directory at some depth under trunk,
release of the hadoop-tools is tied to the release of core. So we actually have these two options instead: (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools) -- Sources at tools/trunk/hadoop-distcp -- Each tool will work with specific version of Hadoop core. -- Releases can really be separate (2) Same source tree: trunk/ -- Sources at either (1.1) trunk/hadoop-tools or (1.2) trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/ -- Given release isn't decoupled anyway, either will work. (1.2) is prefereable if building mapreduce builds the tools also. +Vinod On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu < [EMAIL PROTECTED]> wrote: > Copying common-dev. > > Summarizing the below discussion: What should be the tools layout after > mavenization? > > Option #1: Have hadoop-tools at top level i.e > trunk/ > hadoop-tools/ > hadoop-distcp/ > Pros: > Cleaner layout. > In future, tools could be released separately from Hadoop releases > > Cons: Difficult to maintain > > Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if > they are depending on MapReduce/HDFS/Common respectively. > For ex: > hadoop-mapreduce-project/ > hadoop-mr-tools/ > hadoop-distcp/ > > Pros: Easy to maintain > Cons: Still has tight coupling with related projects. > > Personally, I'm fine with any of the above options. Looking for suggestions > and reaching a consensus on this. > > Thanks > Amareshwari > > On 8/30/11 12:10 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > > > > I have a feeling this discussion should get moved to common-dev or even to > general. > > My #1 question is if tools is basically contrib reborn. If not, what makes > it different? > > On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: > > > Some questions on making hadoop-tools top level under trunk, > > > > 1. Should the patches for tools be created against Hadoop Common? > > 2. What will happen to the tools test automation? Will it run as part of > Hadoop Common tests? > > 3. Will it introduce a dependency from MapReduce to Common? Or is this > taken care in Mavenization? > > > > > > Thanks > > Amareshwari > > > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote: > > > > Please, don't add more Mavenization work on us (eventually I want to go > back > > to coding) > > > > Given that Hadoop is already Mavenized, the patch should be Mavenized. > > > > What will have to be done extra (besides Mavenizing distcp) is to create > a > > hadoop-tools module at root level and within it a hadoop-distcp module. > > > > The hadoop-tools POM will look pretty much like the hadoop-common-project > > POM. > > > > The hadoop-distcp POM should follow the hadoop-common POM patterns. > > > > Thanks. > > > > Alejandro > > > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < > > [EMAIL PROTECTED]> wrote: > > > >> Agree with Mithun and Robert. DistCp and Tools restructuring are > separate > >> tasks. Since DistCp code is ready to be committed, it need not wait for > the > >> Tools separation from MR/HDFS. > >> I would say it can go into contrib as the patch is now, and when the > tools > >> restructuring happens it would be just an svn mv. If there are no > issues > >> with this proposal I can commit the code tomorrow. > >> > >> Thanks > >> Amareshwari > >> > >> On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote: > >> > >> I agree with Mithun. They are related but this goes beyond distcpv2 and > >> should not block distcpv2 from going in. It would be very nice, > however, to > >> get the layout settled soon so that we all know where to find something > when > >> we want to work on it. > >> > >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level. > >> > >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are > separate > >> modules right now, there is still tight coupling between the different
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Mithun Radhakrishnan 2011-09-06, 05:28
I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier.
It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best. In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions. Mithun ________________________________ From: Vinod Kumar Vavilapalli <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Mithun Radhakrishnan <[EMAIL PROTECTED]> Sent: Tuesday, August 30, 2011 6:13 PM Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) As long as hadoop-tools is in some directory at some depth under trunk, release of the hadoop-tools is tied to the release of core. So we actually have these two options instead: (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools) -- Sources at tools/trunk/hadoop-distcp -- Each tool will work with specific version of Hadoop core. -- Releases can really be separate (2) Same source tree: trunk/ -- Sources at either (1.1) trunk/hadoop-tools or (1.2) trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/ -- Given release isn't decoupled anyway, either will work. (1.2) is prefereable if building mapreduce builds the tools also. +Vinod On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu < [EMAIL PROTECTED]> wrote: > Copying common-dev. > > Summarizing the below discussion: What should be the tools layout after > mavenization? > > Option #1: Have hadoop-tools at top level i.e > trunk/ > hadoop-tools/ > hadoop-distcp/ > Pros: > Cleaner layout. > In future, tools could be released separately from Hadoop releases > > Cons: Difficult to maintain > > Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if > they are depending on MapReduce/HDFS/Common respectively. > For ex: > hadoop-mapreduce-project/ > hadoop-mr-tools/ > hadoop-distcp/ > > Pros: Easy to maintain > Cons: Still has tight coupling with related projects. > > Personally, I'm fine with any of the above options. Looking for suggestions > and reaching a consensus on this. > > Thanks > Amareshwari > > On 8/30/11 12:10 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > > > > I have a feeling this discussion should get moved to common-dev or even to > general. > > My #1 question is if tools is basically contrib reborn. If not, what makes > it different? > > On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: > > > Some questions on making hadoop-tools top level under trunk, > > > > 1. Should the patches for tools be created against Hadoop Common? > > 2. What will happen to the tools test automation? Will it run as part of > Hadoop Common tests? > > 3. Will it introduce a dependency from MapReduce to Common? Or is this > taken care in Mavenization? > > > > > > Thanks > > Amareshwari > > > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote: > > > > Please, don't add more Mavenization work on us (eventually I want to go > back > > to coding) > > > > Given that Hadoop is already Mavenized, the patch should be Mavenized. > > > > What will have to be done extra (besides Mavenizing distcp) is to create > a > > hadoop-tools module at root level and within it a hadoop-distcp module. > > > > The hadoop-tools POM will look pretty much like the hadoop-common-project > > POM. > > > > The hadoop-distcp POM should follow the hadoop-common POM patterns. > > > > Thanks. > > > > Alejandro > > > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < > > [EMAIL PROTECTED]> wrote: > > > >> Agree with Mithun and Robert. DistCp and Tools restructuring are
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Amareshwari Sri Ramadasu 2011-09-06, 07:13
+ Copying common dev.
On 9/6/11 10:58 AM, "Mithun Radhakrishnan" <[EMAIL PROTECTED]> wrote: I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier. It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best. In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions. Mithun ________________________________ From: Vinod Kumar Vavilapalli <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Mithun Radhakrishnan <[EMAIL PROTECTED]> Sent: Tuesday, August 30, 2011 6:13 PM Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) As long as hadoop-tools is in some directory at some depth under trunk, release of the hadoop-tools is tied to the release of core. So we actually have these two options instead: (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools) -- Sources at tools/trunk/hadoop-distcp -- Each tool will work with specific version of Hadoop core. -- Releases can really be separate (2) Same source tree: trunk/ -- Sources at either (1.1) trunk/hadoop-tools or (1.2) trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/ -- Given release isn't decoupled anyway, either will work. (1.2) is prefereable if building mapreduce builds the tools also. +Vinod On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu < [EMAIL PROTECTED]> wrote: > Copying common-dev. > > Summarizing the below discussion: What should be the tools layout after > mavenization? > > Option #1: Have hadoop-tools at top level i.e > trunk/ > hadoop-tools/ > hadoop-distcp/ > Pros: > Cleaner layout. > In future, tools could be released separately from Hadoop releases > > Cons: Difficult to maintain > > Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if > they are depending on MapReduce/HDFS/Common respectively. > For ex: > hadoop-mapreduce-project/ > hadoop-mr-tools/ > hadoop-distcp/ > > Pros: Easy to maintain > Cons: Still has tight coupling with related projects. > > Personally, I'm fine with any of the above options. Looking for suggestions > and reaching a consensus on this. > > Thanks > Amareshwari > > On 8/30/11 12:10 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > > > > I have a feeling this discussion should get moved to common-dev or even to > general. > > My #1 question is if tools is basically contrib reborn. If not, what makes > it different? > > On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: > > > Some questions on making hadoop-tools top level under trunk, > > > > 1. Should the patches for tools be created against Hadoop Common? > > 2. What will happen to the tools test automation? Will it run as part of > Hadoop Common tests? > > 3. Will it introduce a dependency from MapReduce to Common? Or is this > taken care in Mavenization? > > > > > > Thanks > > Amareshwari > > > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote: > > > > Please, don't add more Mavenization work on us (eventually I want to go > back > > to coding) > > > > Given that Hadoop is already Mavenized, the patch should be Mavenized. > > > > What will have to be done extra (besides Mavenizing distcp) is to create > a > > hadoop-tools module at root level and within it a hadoop-distcp module. > > > > The hadoop-tools POM will look pretty much like the hadoop-common-project > > POM. > > > > The hadoop-distcp POM should follow the hadoop-common POM patterns. > > > > Thanks. > > > > Alejandro > > > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Arun C Murthy 2011-09-06, 07:19
+1
On Sep 6, 2011, at 12:13 AM, Amareshwari Sri Ramadasu wrote: > + Copying common dev. > > On 9/6/11 10:58 AM, "Mithun Radhakrishnan" <[EMAIL PROTECTED]> wrote: > > I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier. > > It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best. > > In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions. > > Mithun > > > ________________________________ > From: Vinod Kumar Vavilapalli <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Mithun Radhakrishnan <[EMAIL PROTECTED]> > Sent: Tuesday, August 30, 2011 6:13 PM > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > As long as hadoop-tools is in some directory at some depth under trunk, > release of the hadoop-tools is tied to the release of core. > > So we actually have these two options instead: > (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools) > -- Sources at tools/trunk/hadoop-distcp > -- Each tool will work with specific version of Hadoop core. > -- Releases can really be separate > (2) Same source tree: trunk/ > -- Sources at either (1.1) trunk/hadoop-tools or (1.2) > trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/ > -- Given release isn't decoupled anyway, either will work. (1.2) is > prefereable if building mapreduce builds the tools also. > > +Vinod > > > On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Copying common-dev. >> >> Summarizing the below discussion: What should be the tools layout after >> mavenization? >> >> Option #1: Have hadoop-tools at top level i.e >> trunk/ >> hadoop-tools/ >> hadoop-distcp/ >> Pros: >> Cleaner layout. >> In future, tools could be released separately from Hadoop releases >> >> Cons: Difficult to maintain >> >> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if >> they are depending on MapReduce/HDFS/Common respectively. >> For ex: >> hadoop-mapreduce-project/ >> hadoop-mr-tools/ >> hadoop-distcp/ >> >> Pros: Easy to maintain >> Cons: Still has tight coupling with related projects. >> >> Personally, I'm fine with any of the above options. Looking for suggestions >> and reaching a consensus on this. >> >> Thanks >> Amareshwari >> >> On 8/30/11 12:10 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: >> >> >> >> I have a feeling this discussion should get moved to common-dev or even to >> general. >> >> My #1 question is if tools is basically contrib reborn. If not, what makes >> it different? >> >> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: >> >>> Some questions on making hadoop-tools top level under trunk, >>> >>> 1. Should the patches for tools be created against Hadoop Common? >>> 2. What will happen to the tools test automation? Will it run as part of >> Hadoop Common tests? >>> 3. Will it introduce a dependency from MapReduce to Common? Or is this >> taken care in Mavenization? >>> >>> >>> Thanks >>> Amareshwari >>> >>> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote: >>> >>> Please, don't add more Mavenization work on us (eventually I want to go >> back >>> to coding) >>> >>> Given that Hadoop is already Mavenized, the patch should be Mavenized. >>> >>> What will have to be done extra (besides Mavenizing distcp) is to create >> a >>> hadoop-tools module at root level and within it a hadoop-distcp module. >>> >>> The hadoop-tools POM will look pretty much like the hadoop-common-project
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Vinod Kumar Vavilapalli 2011-09-06, 16:30
On Tue, Sep 6, 2011 at 10:58 AM, Mithun Radhakrishnan <
[EMAIL PROTECTED]> wrote: > I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm > hoping that's going to be acceptable to this forum. This way, moving it out > to a separate source tree should be easier. > +1 for moving forward with this proposal. We still need to answer Amareshwari's question (2) she asked some time back about the automated code compilation and test execution of the tools module. Right now we have separate automated builds for common, hdfs and mapreduce. If we go with the above proposal, we need to setup automated builds for the tools modules and possibly tie the related JIRA/Jenkins emails with the common-project lists. > It would be nice to have clarity on how tools will be dealt with. It'd be > convenient to distcp in trunk. (It's tiny and useful.) On the other hand, > that might be opening doors to adding too much, and complicating the > build/release. I'd appreciate advice on which way is best. > > In the meantime, I'll align the distcpv2 pom.xml with the maven-ized > version of things, as per Tucu's suggestions. > > +1 Thanks, +Vinod > ________________________________ > From: Vinod Kumar Vavilapalli <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Mithun > Radhakrishnan <[EMAIL PROTECTED]> > Sent: Tuesday, August 30, 2011 6:13 PM > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > As long as hadoop-tools is in some directory at some depth under trunk, > release of the hadoop-tools is tied to the release of core. > > So we actually have these two options instead: > (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools) > -- Sources at tools/trunk/hadoop-distcp > -- Each tool will work with specific version of Hadoop core. > -- Releases can really be separate > (2) Same source tree: trunk/ > -- Sources at either (1.1) trunk/hadoop-tools or (1.2) > trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/ > -- Given release isn't decoupled anyway, either will work. (1.2) is > prefereable if building mapreduce builds the tools also. > > +Vinod > > > On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > > > Copying common-dev. > > > > Summarizing the below discussion: What should be the tools layout after > > mavenization? > > > > Option #1: Have hadoop-tools at top level i.e > > trunk/ > > hadoop-tools/ > > hadoop-distcp/ > > Pros: > > Cleaner layout. > > In future, tools could be released separately from Hadoop releases > > > > Cons: Difficult to maintain > > > > Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if > > they are depending on MapReduce/HDFS/Common respectively. > > For ex: > > hadoop-mapreduce-project/ > > hadoop-mr-tools/ > > hadoop-distcp/ > > > > Pros: Easy to maintain > > Cons: Still has tight coupling with related projects. > > > > Personally, I'm fine with any of the above options. Looking for > suggestions > > and reaching a consensus on this. > > > > Thanks > > Amareshwari > > > > On 8/30/11 12:10 AM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: > > > > > > > > I have a feeling this discussion should get moved to common-dev or even > to > > general. > > > > My #1 question is if tools is basically contrib reborn. If not, what > makes > > it different? > > > > On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: > > > > > Some questions on making hadoop-tools top level under trunk, > > > > > > 1. Should the patches for tools be created against Hadoop Common? > > > 2. What will happen to the tools test automation? Will it run as part > of > > Hadoop Common tests? > > > 3. Will it introduce a dependency from MapReduce to Common? Or is this > > taken care in Mavenization? > > > > > > > > > Thanks > > > Amareshwari > > > > > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote:
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Allen Wittenauer 2011-09-06, 17:11
On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: > We still need to answer Amareshwari's question (2) she asked some time back > about the automated code compilation and test execution of the tools module. >>> My #1 question is if tools is basically contrib reborn. If not, what >> makes >>> it different? I'm still waiting for this answer as well. Until such, I would be pretty much against a tools module. Changing the name of the dumping ground doesn't make it any less of a dumping ground.
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Eli Collins 2011-09-06, 23:32
On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote:
> > On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: >> We still need to answer Amareshwari's question (2) she asked some time back >> about the automated code compilation and test execution of the tools module. > > > >>>> My #1 question is if tools is basically contrib reborn. If not, what >>> makes >>>> it different? > > > I'm still waiting for this answer as well. > > Until such, I would be pretty much against a tools module. Changing the name of the dumping ground doesn't make it any less of a dumping ground. IMO if the tools module only gets stuff like distcp that's maintained then it's not contrib, if it contains all the stuff from the current MR contrib then tools is just a re-labeling of contrib. Given that this proposal only covers moving distcp to tools it doesn't sound like contrib to me. Thanks, Eli
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Allen Wittenauer 2011-09-06, 23:46
On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: > > IMO if the tools module only gets stuff like distcp that's maintained > then it's not contrib, if it contains all the stuff from the current > MR contrib then tools is just a re-labeling of contrib. Given that > this proposal only covers moving distcp to tools it doesn't sound like > contrib to me. At one point, everything in contrib was maintained. So I guess the big question is: what is the gating criteria for something to get entry into tools?
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Eric Yang 2011-09-07, 01:38
Option #2 proposed by Amareshwari, seems like a better proposal. We don't want to repeat history for contrib again with hadoop-tools. Having a generic module like hadoop-tools increases the risk of accumulate dead code. It would be better to categorize the hdfs or mapreduce specific tools in their respected subcategories. It is also easier to manage from package/deployment prospective.
regards, Eric On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: >>> We still need to answer Amareshwari's question (2) she asked some time back >>> about the automated code compilation and test execution of the tools module. >> >> >> >>>>> My #1 question is if tools is basically contrib reborn. If not, what >>>> makes >>>>> it different? >> >> >> I'm still waiting for this answer as well. >> >> Until such, I would be pretty much against a tools module. Changing the name of the dumping ground doesn't make it any less of a dumping ground. > > IMO if the tools module only gets stuff like distcp that's maintained > then it's not contrib, if it contains all the stuff from the current > MR contrib then tools is just a re-labeling of contrib. Given that > this proposal only covers moving distcp to tools it doesn't sound like > contrib to me. > > Thanks, > Eli
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Alejandro Abdelnur 2011-09-07, 01:55
Eric,
Personally I'm fine either way. Still, I fail to see why a generic/categorized tools increase/reduce the risk of dead code and how they make more-difficult/easier the package&deployment. Would you please explain this? Thanks. Alejandro On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > Option #2 proposed by Amareshwari, seems like a better proposal. We don't > want to repeat history for contrib again with hadoop-tools. Having a > generic module like hadoop-tools increases the risk of accumulate dead code. > It would be better to categorize the hdfs or mapreduce specific tools in > their respected subcategories. It is also easier to manage from > package/deployment prospective. > > regards, > Eric > > On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: > > > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > >> > >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: > >>> We still need to answer Amareshwari's question (2) she asked some time > back > >>> about the automated code compilation and test execution of the tools > module. > >> > >> > >> > >>>>> My #1 question is if tools is basically contrib reborn. If not, what > >>>> makes > >>>>> it different? > >> > >> > >> I'm still waiting for this answer as well. > >> > >> Until such, I would be pretty much against a tools module. > Changing the name of the dumping ground doesn't make it any less of a > dumping ground. > > > > IMO if the tools module only gets stuff like distcp that's maintained > > then it's not contrib, if it contains all the stuff from the current > > MR contrib then tools is just a re-labeling of contrib. Given that > > this proposal only covers moving distcp to tools it doesn't sound like > > contrib to me. > > > > Thanks, > > Eli > >
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Vinod Kumar Vavilapalli 2011-09-07, 13:32
There are a bunch of so called tools in hadoop-mapreduce-project/src/tools -
DistCp, HadoopArchives, Rumen etc. And contrib projects are in src/contrib in all of common, hdfs and mapred source trees. Not sure how the distinction was ever made. The last time we had a discussion about moving contrib projects out of the core, we didn't reach any consensus - * http://s.apache.org/HadoopContribDiscussion*. Do we want to revive that discucssion now? Or we want to keep the status-quo, imitate the source structure of the present day tools and contrib, but move them to appropriate maven modules and then have that discussion separately? I personally prefer the later, given the length and the eventual failure of the previous discussion. HADOOP-7590 is a related issue where the src location of contribs like gridmix, streaming etc is being talked about. I suppose that issue and this thread ought to converge. Thanks, +Vinod
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Eric Yang 2011-09-07, 17:50
Mapreduce and HDFS are distinct function of Hadoop. They are loosely
coupled. If we have tools aggregator module, it will not have as clear distinct function as other Hadoop modules. Hence, it is possible for a tool to be depend on both HDFS and map reduce. If something broke in tools module, it is unclear which subproject's responsibility to maintain tools function. Therefore, it is safer to send tools to incubator or apache extra rather than deposit the utility tools in tools subcategory. There are many short lived projects that attempts to associate themselves with Hadoop but not being maintained. It would be better to spin off those utility projects than use Hadoop as a dumping ground. The previous discussion for removing contrib, most people were in favor of doing so, and only a few contrib owners were reluctant to remove contrib. Fewer people has participated in restore functionality of broken contrib projects. History speaks for itself. -1 (non-binding) for hadoop-tools. regards, Eric On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > Eric, > > Personally I'm fine either way. > > Still, I fail to see why a generic/categorized tools increase/reduce the > risk of dead code and how they make more-difficult/easier the > package&deployment. > > Would you please explain this? > > Thanks. > > Alejandro > > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > >> Option #2 proposed by Amareshwari, seems like a better proposal. We don't >> want to repeat history for contrib again with hadoop-tools. Having a >> generic module like hadoop-tools increases the risk of accumulate dead code. >> It would be better to categorize the hdfs or mapreduce specific tools in >> their respected subcategories. It is also easier to manage from >> package/deployment prospective. >> >> regards, >> Eric >> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: >> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: >> >> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: >> >>> We still need to answer Amareshwari's question (2) she asked some time >> back >> >>> about the automated code compilation and test execution of the tools >> module. >> >> >> >> >> >> >> >>>>> My #1 question is if tools is basically contrib reborn. If not, what >> >>>> makes >> >>>>> it different? >> >> >> >> >> >> I'm still waiting for this answer as well. >> >> >> >> Until such, I would be pretty much against a tools module. >> Changing the name of the dumping ground doesn't make it any less of a >> dumping ground. >> > >> > IMO if the tools module only gets stuff like distcp that's maintained >> > then it's not contrib, if it contains all the stuff from the current >> > MR contrib then tools is just a re-labeling of contrib. Given that >> > this proposal only covers moving distcp to tools it doesn't sound like >> > contrib to me. >> > >> > Thanks, >> > Eli >> >> >
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Alejandro Abdelnur 2011-09-07, 18:18
Agreed, we should not have a dumping ground. IMO, what it would go into
hadoop-tools (i.e. distcp, streaming and someone could argue for FsShell as well) are effectively hadoop CLI utilities. Having them in a separate module rather in than in the core module (common, hdfs, mapreduce) does not mean that they are secondary things, just modularization. Also it will help to get those tools to use public interfaces of the core module, and when we finally have a clean hadoop-client layer, those tools should only depend on that. Finally, the fact that tools would end up under trunk/hadoop-tools, it does not prevent that the packaging from HDFS and MAPREDUCE to bundle the same/different tools +1 for hadoop-tools/ (not binding) Thanks. On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <[EMAIL PROTECTED]> wrote: > Mapreduce and HDFS are distinct function of Hadoop. They are loosely > coupled. If we have tools aggregator module, it will not have as > clear distinct function as other Hadoop modules. Hence, it is > possible for a tool to be depend on both HDFS and map reduce. If > something broke in tools module, it is unclear which subproject's > responsibility to maintain tools function. Therefore, it is safer to > send tools to incubator or apache extra rather than deposit the > utility tools in tools subcategory. There are many short lived > projects that attempts to associate themselves with Hadoop but not > being maintained. It would be better to spin off those utility > projects than use Hadoop as a dumping ground. > > The previous discussion for removing contrib, most people were in > favor of doing so, and only a few contrib owners were reluctant to > remove contrib. Fewer people has participated in restore > functionality of broken contrib projects. History speaks for itself. > -1 (non-binding) for hadoop-tools. > > regards, > Eric > > On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> > wrote: > > Eric, > > > > Personally I'm fine either way. > > > > Still, I fail to see why a generic/categorized tools increase/reduce the > > risk of dead code and how they make more-difficult/easier the > > package&deployment. > > > > Would you please explain this? > > > > Thanks. > > > > Alejandro > > > > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > > > >> Option #2 proposed by Amareshwari, seems like a better proposal. We > don't > >> want to repeat history for contrib again with hadoop-tools. Having a > >> generic module like hadoop-tools increases the risk of accumulate dead > code. > >> It would be better to categorize the hdfs or mapreduce specific tools > in > >> their respected subcategories. It is also easier to manage from > >> package/deployment prospective. > >> > >> regards, > >> Eric > >> > >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: > >> > >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <[EMAIL PROTECTED]> > wrote: > >> >> > >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: > >> >>> We still need to answer Amareshwari's question (2) she asked some > time > >> back > >> >>> about the automated code compilation and test execution of the tools > >> module. > >> >> > >> >> > >> >> > >> >>>>> My #1 question is if tools is basically contrib reborn. If not, > what > >> >>>> makes > >> >>>>> it different? > >> >> > >> >> > >> >> I'm still waiting for this answer as well. > >> >> > >> >> Until such, I would be pretty much against a tools module. > >> Changing the name of the dumping ground doesn't make it any less of a > >> dumping ground. > >> > > >> > IMO if the tools module only gets stuff like distcp that's maintained > >> > then it's not contrib, if it contains all the stuff from the current > >> > MR contrib then tools is just a re-labeling of contrib. Given that > >> > this proposal only covers moving distcp to tools it doesn't sound like > >> > contrib to me. > >> > > >> > Thanks, > >> > Eli > >> > >> > > >
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Mahadev Konar 2011-09-07, 18:27
I like the idea of having tools as a seperate module and I dont think
that it will be a dumping ground unless we choose to make one of it. +1 for hadoop tools module under trunk. thanks mahadev On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > Agreed, we should not have a dumping ground. IMO, what it would go into > hadoop-tools (i.e. distcp, streaming and someone could argue for FsShell as > well) are effectively hadoop CLI utilities. Having them in a separate module > rather in than in the core module (common, hdfs, mapreduce) does not mean > that they are secondary things, just modularization. Also it will help to > get those tools to use public interfaces of the core module, and when we > finally have a clean hadoop-client layer, those tools should only depend on > that. > > Finally, the fact that tools would end up under trunk/hadoop-tools, it does > not prevent that the packaging from HDFS and MAPREDUCE to bundle the > same/different tools > > +1 for hadoop-tools/ (not binding) > > Thanks. > > > On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <[EMAIL PROTECTED]> wrote: > >> Mapreduce and HDFS are distinct function of Hadoop. They are loosely >> coupled. If we have tools aggregator module, it will not have as >> clear distinct function as other Hadoop modules. Hence, it is >> possible for a tool to be depend on both HDFS and map reduce. If >> something broke in tools module, it is unclear which subproject's >> responsibility to maintain tools function. Therefore, it is safer to >> send tools to incubator or apache extra rather than deposit the >> utility tools in tools subcategory. There are many short lived >> projects that attempts to associate themselves with Hadoop but not >> being maintained. It would be better to spin off those utility >> projects than use Hadoop as a dumping ground. >> >> The previous discussion for removing contrib, most people were in >> favor of doing so, and only a few contrib owners were reluctant to >> remove contrib. Fewer people has participated in restore >> functionality of broken contrib projects. History speaks for itself. >> -1 (non-binding) for hadoop-tools. >> >> regards, >> Eric >> >> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >> > Eric, >> > >> > Personally I'm fine either way. >> > >> > Still, I fail to see why a generic/categorized tools increase/reduce the >> > risk of dead code and how they make more-difficult/easier the >> > package&deployment. >> > >> > Would you please explain this? >> > >> > Thanks. >> > >> > Alejandro >> > >> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <[EMAIL PROTECTED]> wrote: >> > >> >> Option #2 proposed by Amareshwari, seems like a better proposal. We >> don't >> >> want to repeat history for contrib again with hadoop-tools. Having a >> >> generic module like hadoop-tools increases the risk of accumulate dead >> code. >> >> It would be better to categorize the hdfs or mapreduce specific tools >> in >> >> their respected subcategories. It is also easier to manage from >> >> package/deployment prospective. >> >> >> >> regards, >> >> Eric >> >> >> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: >> >> >> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <[EMAIL PROTECTED]> >> wrote: >> >> >> >> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: >> >> >>> We still need to answer Amareshwari's question (2) she asked some >> time >> >> back >> >> >>> about the automated code compilation and test execution of the tools >> >> module. >> >> >> >> >> >> >> >> >> >> >> >>>>> My #1 question is if tools is basically contrib reborn. If not, >> what >> >> >>>> makes >> >> >>>>> it different? >> >> >> >> >> >> >> >> >> I'm still waiting for this answer as well. >> >> >> >> >> >> Until such, I would be pretty much against a tools module. >> >> Changing the name of the dumping ground doesn't make it any less of a >> >> dumping ground. >> >> > >> >> > IMO if the tools module only gets stuff like distcp that's maintained
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Milind.Bhandarkar@... 2011-09-07, 18:32
+1 for separate hadoop-tools module. However, if a tool is broken at
release time, and no one comes forward to fix it, it should be removed. (i.e. Unlike contrib modules, where build and test failures were tolerated.) - milind On 9/7/11 11:27 AM, "Mahadev Konar" <[EMAIL PROTECTED]> wrote: >I like the idea of having tools as a seperate module and I dont think >that it will be a dumping ground unless we choose to make one of it. > >+1 for hadoop tools module under trunk. > >thanks >mahadev > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> >wrote: >> Agreed, we should not have a dumping ground. IMO, what it would go into >> hadoop-tools (i.e. distcp, streaming and someone could argue for >>FsShell as >> well) are effectively hadoop CLI utilities. Having them in a separate >>module >> rather in than in the core module (common, hdfs, mapreduce) does not >>mean >> that they are secondary things, just modularization. Also it will help >>to >> get those tools to use public interfaces of the core module, and when we >> finally have a clean hadoop-client layer, those tools should only >>depend on >> that. >> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it >>does >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the >> same/different tools >> >> +1 for hadoop-tools/ (not binding) >> >> Thanks. >> >> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <[EMAIL PROTECTED]> wrote: >> >>> Mapreduce and HDFS are distinct function of Hadoop. They are loosely >>> coupled. If we have tools aggregator module, it will not have as >>> clear distinct function as other Hadoop modules. Hence, it is >>> possible for a tool to be depend on both HDFS and map reduce. If >>> something broke in tools module, it is unclear which subproject's >>> responsibility to maintain tools function. Therefore, it is safer to >>> send tools to incubator or apache extra rather than deposit the >>> utility tools in tools subcategory. There are many short lived >>> projects that attempts to associate themselves with Hadoop but not >>> being maintained. It would be better to spin off those utility >>> projects than use Hadoop as a dumping ground. >>> >>> The previous discussion for removing contrib, most people were in >>> favor of doing so, and only a few contrib owners were reluctant to >>> remove contrib. Fewer people has participated in restore >>> functionality of broken contrib projects. History speaks for itself. >>> -1 (non-binding) for hadoop-tools. >>> >>> regards, >>> Eric >>> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> >>> wrote: >>> > Eric, >>> > >>> > Personally I'm fine either way. >>> > >>> > Still, I fail to see why a generic/categorized tools increase/reduce >>>the >>> > risk of dead code and how they make more-difficult/easier the >>> > package&deployment. >>> > >>> > Would you please explain this? >>> > >>> > Thanks. >>> > >>> > Alejandro >>> > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <[EMAIL PROTECTED]> wrote: >>> > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal. We >>> don't >>> >> want to repeat history for contrib again with hadoop-tools. Having >>>a >>> >> generic module like hadoop-tools increases the risk of accumulate >>>dead >>> code. >>> >> It would be better to categorize the hdfs or mapreduce specific >>>tools >>> in >>> >> their respected subcategories. It is also easier to manage from >>> >> package/deployment prospective. >>> >> >>> >> regards, >>> >> Eric >>> >> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: >>> >> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <[EMAIL PROTECTED]> >>> wrote: >>> >> >> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: >>> >> >>> We still need to answer Amareshwari's question (2) she asked >>>some >>> time >>> >> back >>> >> >>> about the automated code compilation and test execution of the >>>tools >>> >> module. >>> >> >> >>> >>
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Alejandro Abdelnur 2011-09-07, 18:35
Makes sense
On Wed, Sep 7, 2011 at 11:32 AM, <[EMAIL PROTECTED]> wrote: > +1 for separate hadoop-tools module. However, if a tool is broken at > release time, and no one comes forward to fix it, it should be removed. > (i.e. Unlike contrib modules, where build and test failures were > tolerated.) > > - milind > > On 9/7/11 11:27 AM, "Mahadev Konar" <[EMAIL PROTECTED]> wrote: > > >I like the idea of having tools as a seperate module and I dont think > >that it will be a dumping ground unless we choose to make one of it. > > > >+1 for hadoop tools module under trunk. > > > >thanks > >mahadev > > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> > >wrote: > >> Agreed, we should not have a dumping ground. IMO, what it would go into > >> hadoop-tools (i.e. distcp, streaming and someone could argue for > >>FsShell as > >> well) are effectively hadoop CLI utilities. Having them in a separate > >>module > >> rather in than in the core module (common, hdfs, mapreduce) does not > >>mean > >> that they are secondary things, just modularization. Also it will help > >>to > >> get those tools to use public interfaces of the core module, and when we > >> finally have a clean hadoop-client layer, those tools should only > >>depend on > >> that. > >> > >> Finally, the fact that tools would end up under trunk/hadoop-tools, it > >>does > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the > >> same/different tools > >> > >> +1 for hadoop-tools/ (not binding) > >> > >> Thanks. > >> > >> > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <[EMAIL PROTECTED]> wrote: > >> > >>> Mapreduce and HDFS are distinct function of Hadoop. They are loosely > >>> coupled. If we have tools aggregator module, it will not have as > >>> clear distinct function as other Hadoop modules. Hence, it is > >>> possible for a tool to be depend on both HDFS and map reduce. If > >>> something broke in tools module, it is unclear which subproject's > >>> responsibility to maintain tools function. Therefore, it is safer to > >>> send tools to incubator or apache extra rather than deposit the > >>> utility tools in tools subcategory. There are many short lived > >>> projects that attempts to associate themselves with Hadoop but not > >>> being maintained. It would be better to spin off those utility > >>> projects than use Hadoop as a dumping ground. > >>> > >>> The previous discussion for removing contrib, most people were in > >>> favor of doing so, and only a few contrib owners were reluctant to > >>> remove contrib. Fewer people has participated in restore > >>> functionality of broken contrib projects. History speaks for itself. > >>> -1 (non-binding) for hadoop-tools. > >>> > >>> regards, > >>> Eric > >>> > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> > >>> wrote: > >>> > Eric, > >>> > > >>> > Personally I'm fine either way. > >>> > > >>> > Still, I fail to see why a generic/categorized tools increase/reduce > >>>the > >>> > risk of dead code and how they make more-difficult/easier the > >>> > package&deployment. > >>> > > >>> > Would you please explain this? > >>> > > >>> > Thanks. > >>> > > >>> > Alejandro > >>> > > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > >>> > > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal. We > >>> don't > >>> >> want to repeat history for contrib again with hadoop-tools. Having > >>>a > >>> >> generic module like hadoop-tools increases the risk of accumulate > >>>dead > >>> code. > >>> >> It would be better to categorize the hdfs or mapreduce specific > >>>tools > >>> in > >>> >> their respected subcategories. It is also easier to manage from > >>> >> package/deployment prospective. > >>> >> > >>> >> regards, > >>> >> Eric > >>> >> > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: > >>> >> > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <[EMAIL PROTECTED]> > >>> wrote: > >>>
-
RE: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Rottinghuis, Joep 2011-09-08, 03:43
Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?
Thanks, Joep ________________________________________ From: Alejandro Abdelnur [[EMAIL PROTECTED]] Sent: Wednesday, September 07, 2011 11:35 AM To: [EMAIL PROTECTED] Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) Makes sense On Wed, Sep 7, 2011 at 11:32 AM, <[EMAIL PROTECTED]> wrote: > +1 for separate hadoop-tools module. However, if a tool is broken at > release time, and no one comes forward to fix it, it should be removed. > (i.e. Unlike contrib modules, where build and test failures were > tolerated.) > > - milind > > On 9/7/11 11:27 AM, "Mahadev Konar" <[EMAIL PROTECTED]> wrote: > > >I like the idea of having tools as a seperate module and I dont think > >that it will be a dumping ground unless we choose to make one of it. > > > >+1 for hadoop tools module under trunk. > > > >thanks > >mahadev > > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> > >wrote: > >> Agreed, we should not have a dumping ground. IMO, what it would go into > >> hadoop-tools (i.e. distcp, streaming and someone could argue for > >>FsShell as > >> well) are effectively hadoop CLI utilities. Having them in a separate > >>module > >> rather in than in the core module (common, hdfs, mapreduce) does not > >>mean > >> that they are secondary things, just modularization. Also it will help > >>to > >> get those tools to use public interfaces of the core module, and when we > >> finally have a clean hadoop-client layer, those tools should only > >>depend on > >> that. > >> > >> Finally, the fact that tools would end up under trunk/hadoop-tools, it > >>does > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the > >> same/different tools > >> > >> +1 for hadoop-tools/ (not binding) > >> > >> Thanks. > >> > >> > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <[EMAIL PROTECTED]> wrote: > >> > >>> Mapreduce and HDFS are distinct function of Hadoop. They are loosely > >>> coupled. If we have tools aggregator module, it will not have as > >>> clear distinct function as other Hadoop modules. Hence, it is > >>> possible for a tool to be depend on both HDFS and map reduce. If > >>> something broke in tools module, it is unclear which subproject's > >>> responsibility to maintain tools function. Therefore, it is safer to > >>> send tools to incubator or apache extra rather than deposit the > >>> utility tools in tools subcategory. There are many short lived > >>> projects that attempts to associate themselves with Hadoop but not > >>> being maintained. It would be better to spin off those utility > >>> projects than use Hadoop as a dumping ground. > >>> > >>> The previous discussion for removing contrib, most people were in > >>> favor of doing so, and only a few contrib owners were reluctant to > >>> remove contrib. Fewer people has participated in restore > >>> functionality of broken contrib projects. History speaks for itself. > >>> -1 (non-binding) for hadoop-tools. > >>> > >>> regards, > >>> Eric > >>> > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> > >>> wrote: > >>> > Eric, > >>> > > >>> > Personally I'm fine either way. > >>> > > >>> > Still, I fail to see why a generic/categorized tools increase/reduce > >>>the > >>> > risk of dead code and how they make more-difficult/easier the > >>> > package&deployment. > >>> > > >>> > Would you please explain this? > >>> > > >>> > Thanks. > >>> > > >>> > Alejandro > >>> > > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > >>> > > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal. We > >>> don't > >>> >> want to repeat history for contrib again with hadoop-tools. Having > >>>a > >>> >> generic module like hadoop-tools increases the risk of accumulate > >>>dead > >>> code. > >>> >> It would be better to categorize the hdfs or mapreduce specific > >>>tools
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Amareshwari Sri Ramadasu 2011-09-08, 04:33
It is good to have hadoop-tools module separately. But as I asked before we need to answer some questions here. I'm trying to answer them myself. Comments are welcome.
> > 1. Should the patches for tools be created against Hadoop Common? Here, I meant should Hadoop common mailing list be used Or should we have a separate mailing list for Tools? I agree with Vinod here, that we can tie it Hadoop-common jira/mailing lists. > > 2. What will happen to the tools test automation? Will it run as part of Hadoop Common tests? Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop common if use Hadoop common mailing list for this. Also, I propose every patch build of HDFS and MAPREDUCE should also run tools tests to make sure nothing is broken. That would ease the maintenance of hadoop-tools module. I presume tools test should not take much time (some thing like not more than 30 minutes). > > 3. Will it introduce a dependency from MapReduce to Common? Or is this > taken care in Mavenization? I'm not sure about this whether Mavenization can take care of it. Thanks Amareshwari On 9/8/11 9:13 AM, "Rottinghuis, Joep" <[EMAIL PROTECTED]> wrote: Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well? Thanks, Joep ________________________________________ From: Alejandro Abdelnur [[EMAIL PROTECTED]] Sent: Wednesday, September 07, 2011 11:35 AM To: [EMAIL PROTECTED] Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) Makes sense On Wed, Sep 7, 2011 at 11:32 AM, <[EMAIL PROTECTED]> wrote: > +1 for separate hadoop-tools module. However, if a tool is broken at > release time, and no one comes forward to fix it, it should be removed. > (i.e. Unlike contrib modules, where build and test failures were > tolerated.) > > - milind > > On 9/7/11 11:27 AM, "Mahadev Konar" <[EMAIL PROTECTED]> wrote: > > >I like the idea of having tools as a seperate module and I dont think > >that it will be a dumping ground unless we choose to make one of it. > > > >+1 for hadoop tools module under trunk. > > > >thanks > >mahadev > > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> > >wrote: > >> Agreed, we should not have a dumping ground. IMO, what it would go into > >> hadoop-tools (i.e. distcp, streaming and someone could argue for > >>FsShell as > >> well) are effectively hadoop CLI utilities. Having them in a separate > >>module > >> rather in than in the core module (common, hdfs, mapreduce) does not > >>mean > >> that they are secondary things, just modularization. Also it will help > >>to > >> get those tools to use public interfaces of the core module, and when we > >> finally have a clean hadoop-client layer, those tools should only > >>depend on > >> that. > >> > >> Finally, the fact that tools would end up under trunk/hadoop-tools, it > >>does > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the > >> same/different tools > >> > >> +1 for hadoop-tools/ (not binding) > >> > >> Thanks. > >> > >> > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <[EMAIL PROTECTED]> wrote: > >> > >>> Mapreduce and HDFS are distinct function of Hadoop. They are loosely > >>> coupled. If we have tools aggregator module, it will not have as > >>> clear distinct function as other Hadoop modules. Hence, it is > >>> possible for a tool to be depend on both HDFS and map reduce. If > >>> something broke in tools module, it is unclear which subproject's > >>> responsibility to maintain tools function. Therefore, it is safer to > >>> send tools to incubator or apache extra rather than deposit the > >>> utility tools in tools subcategory. There are many short lived > >>> projects that attempts to associate themselves with Hadoop but not > >>> being maintained. It would be better to spin off those utility > >>> projects than use Hadoop as a dumping ground. > >>> > >>> The previous discussion for removing contrib, most people were in
-
RE: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Rottinghuis, Joep 2011-09-09, 05:25
If hadoop-tools will be built as part of hadoop-common, then none of these tools should be allowed to have a dependency on hdfs or mapreduce.
Conversely is also true, when tools do have any such dependency, they cannot be bult as part of hadoop-common. We cannot have circular dependencies like that. That is probably obvious, but I'm just saying... Joep ________________________________________ From: Amareshwari Sri Ramadasu [[EMAIL PROTECTED]] Sent: Wednesday, September 07, 2011 9:33 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) It is good to have hadoop-tools module separately. But as I asked before we need to answer some questions here. I'm trying to answer them myself. Comments are welcome. > > 1. Should the patches for tools be created against Hadoop Common? Here, I meant should Hadoop common mailing list be used Or should we have a separate mailing list for Tools? I agree with Vinod here, that we can tie it Hadoop-common jira/mailing lists. > > 2. What will happen to the tools test automation? Will it run as part of Hadoop Common tests? Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop common if use Hadoop common mailing list for this. Also, I propose every patch build of HDFS and MAPREDUCE should also run tools tests to make sure nothing is broken. That would ease the maintenance of hadoop-tools module. I presume tools test should not take much time (some thing like not more than 30 minutes). > > 3. Will it introduce a dependency from MapReduce to Common? Or is this > taken care in Mavenization? I'm not sure about this whether Mavenization can take care of it. Thanks Amareshwari On 9/8/11 9:13 AM, "Rottinghuis, Joep" <[EMAIL PROTECTED]> wrote: Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well? Thanks, Joep ________________________________________ From: Alejandro Abdelnur [[EMAIL PROTECTED]] Sent: Wednesday, September 07, 2011 11:35 AM To: [EMAIL PROTECTED] Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) Makes sense On Wed, Sep 7, 2011 at 11:32 AM, <[EMAIL PROTECTED]> wrote: > +1 for separate hadoop-tools module. However, if a tool is broken at > release time, and no one comes forward to fix it, it should be removed. > (i.e. Unlike contrib modules, where build and test failures were > tolerated.) > > - milind > > On 9/7/11 11:27 AM, "Mahadev Konar" <[EMAIL PROTECTED]> wrote: > > >I like the idea of having tools as a seperate module and I dont think > >that it will be a dumping ground unless we choose to make one of it. > > > >+1 for hadoop tools module under trunk. > > > >thanks > >mahadev > > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> > >wrote: > >> Agreed, we should not have a dumping ground. IMO, what it would go into > >> hadoop-tools (i.e. distcp, streaming and someone could argue for > >>FsShell as > >> well) are effectively hadoop CLI utilities. Having them in a separate > >>module > >> rather in than in the core module (common, hdfs, mapreduce) does not > >>mean > >> that they are secondary things, just modularization. Also it will help > >>to > >> get those tools to use public interfaces of the core module, and when we > >> finally have a clean hadoop-client layer, those tools should only > >>depend on > >> that. > >> > >> Finally, the fact that tools would end up under trunk/hadoop-tools, it > >>does > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the > >> same/different tools > >> > >> +1 for hadoop-tools/ (not binding) > >> > >> Thanks. > >> > >> > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <[EMAIL PROTECTED]> wrote: > >> > >>> Mapreduce and HDFS are distinct function of Hadoop. They are loosely > >>> coupled. If we have tools aggregator module, it will not have as > >>> clear distinct function as other Hadoop modules. Hence, it is > >>> possible for a tool to be depend on both HDFS and map reduce. If
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Vinod Kumar Vavilapalli 2011-09-12, 13:47
Alright, I think we've discussed enough on this and everybody seems to agree
about a top level hadoop-tools module. Time to get into the action. I've filed HADOOP-7624. Amareshwari we can track the rest of the implementation related details and questions for your specific answers there. Thanks everyone for putting in your thoughts here. +Vinod On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <[EMAIL PROTECTED]>wrote: > If hadoop-tools will be built as part of hadoop-common, then none of these > tools should be allowed to have a dependency on hdfs or mapreduce. > Conversely is also true, when tools do have any such dependency, they > cannot be bult as part of hadoop-common. > We cannot have circular dependencies like that. > > That is probably obvious, but I'm just saying... > > Joep > ________________________________________ > From: Amareshwari Sri Ramadasu [[EMAIL PROTECTED]] > Sent: Wednesday, September 07, 2011 9:33 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > It is good to have hadoop-tools module separately. But as I asked before we > need to answer some questions here. I'm trying to answer them myself. > Comments are welcome. > > > > 1. Should the patches for tools be created against Hadoop Common? > Here, I meant should Hadoop common mailing list be used Or should we have a > separate mailing list for Tools? I agree with Vinod here, that we can tie > it Hadoop-common jira/mailing lists. > > > > 2. What will happen to the tools test automation? Will it run as part > of Hadoop Common tests? > Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop > common if use Hadoop common mailing list for this. > Also, I propose every patch build of HDFS and MAPREDUCE should also run > tools tests to make sure nothing is broken. That would ease the maintenance > of hadoop-tools module. I presume tools test should not take much time (some > thing like not more than 30 minutes). > > > > 3. Will it introduce a dependency from MapReduce to Common? Or is this > > taken care in Mavenization? > I'm not sure about this whether Mavenization can take care of it. > > Thanks > Amareshwari > > On 9/8/11 9:13 AM, "Rottinghuis, Joep" <[EMAIL PROTECTED]> wrote: > > Does a separate hadoop-tools module imply that there will be a separate > Jenkins build as well? > > Thanks, > > Joep > ________________________________________ > From: Alejandro Abdelnur [[EMAIL PROTECTED]] > Sent: Wednesday, September 07, 2011 11:35 AM > To: [EMAIL PROTECTED] > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > Makes sense > > On Wed, Sep 7, 2011 at 11:32 AM, <[EMAIL PROTECTED]> wrote: > > > +1 for separate hadoop-tools module. However, if a tool is broken at > > release time, and no one comes forward to fix it, it should be removed. > > (i.e. Unlike contrib modules, where build and test failures were > > tolerated.) > > > > - milind > > > > On 9/7/11 11:27 AM, "Mahadev Konar" <[EMAIL PROTECTED]> wrote: > > > > >I like the idea of having tools as a seperate module and I dont think > > >that it will be a dumping ground unless we choose to make one of it. > > > > > >+1 for hadoop tools module under trunk. > > > > > >thanks > > >mahadev > > > > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> > > >wrote: > > >> Agreed, we should not have a dumping ground. IMO, what it would go > into > > >> hadoop-tools (i.e. distcp, streaming and someone could argue for > > >>FsShell as > > >> well) are effectively hadoop CLI utilities. Having them in a separate > > >>module > > >> rather in than in the core module (common, hdfs, mapreduce) does not > > >>mean > > >> that they are secondary things, just modularization. Also it will help > > >>to > > >> get those tools to use public interfaces of the core module, and when > we > > >> finally have a clean hadoop-client layer, those tools should only
-
Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)Alejandro Abdelnur 2011-10-18, 19:41
Following up on this one, the hadoop-tools/ module is already in trunk,
distcp v2 addition could start. Thanks. Alejandro On Mon, Sep 12, 2011 at 6:47 AM, Vinod Kumar Vavilapalli < [EMAIL PROTECTED]> wrote: > Alright, I think we've discussed enough on this and everybody seems to > agree > about a top level hadoop-tools module. > > Time to get into the action. I've filed HADOOP-7624. Amareshwari we can > track the rest of the implementation related details and questions for your > specific answers there. > > Thanks everyone for putting in your thoughts here. > +Vinod > > > On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <[EMAIL PROTECTED] > >wrote: > > > If hadoop-tools will be built as part of hadoop-common, then none of > these > > tools should be allowed to have a dependency on hdfs or mapreduce. > > Conversely is also true, when tools do have any such dependency, they > > cannot be bult as part of hadoop-common. > > We cannot have circular dependencies like that. > > > > That is probably obvious, but I'm just saying... > > > > Joep > > ________________________________________ > > From: Amareshwari Sri Ramadasu [[EMAIL PROTECTED]] > > Sent: Wednesday, September 07, 2011 9:33 PM > > To: [EMAIL PROTECTED] > > Cc: [EMAIL PROTECTED] > > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > > > It is good to have hadoop-tools module separately. But as I asked before > we > > need to answer some questions here. I'm trying to answer them myself. > > Comments are welcome. > > > > > > 1. Should the patches for tools be created against Hadoop Common? > > Here, I meant should Hadoop common mailing list be used Or should we have > a > > separate mailing list for Tools? I agree with Vinod here, that we can > tie > > it Hadoop-common jira/mailing lists. > > > > > > 2. What will happen to the tools test automation? Will it run as > part > > of Hadoop Common tests? > > Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop > > common if use Hadoop common mailing list for this. > > Also, I propose every patch build of HDFS and MAPREDUCE should also run > > tools tests to make sure nothing is broken. That would ease the > maintenance > > of hadoop-tools module. I presume tools test should not take much time > (some > > thing like not more than 30 minutes). > > > > > > 3. Will it introduce a dependency from MapReduce to Common? Or is > this > > > taken care in Mavenization? > > I'm not sure about this whether Mavenization can take care of it. > > > > Thanks > > Amareshwari > > > > On 9/8/11 9:13 AM, "Rottinghuis, Joep" <[EMAIL PROTECTED]> wrote: > > > > Does a separate hadoop-tools module imply that there will be a separate > > Jenkins build as well? > > > > Thanks, > > > > Joep > > ________________________________________ > > From: Alejandro Abdelnur [[EMAIL PROTECTED]] > > Sent: Wednesday, September 07, 2011 11:35 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > > > Makes sense > > > > On Wed, Sep 7, 2011 at 11:32 AM, <[EMAIL PROTECTED]> wrote: > > > > > +1 for separate hadoop-tools module. However, if a tool is broken at > > > release time, and no one comes forward to fix it, it should be removed. > > > (i.e. Unlike contrib modules, where build and test failures were > > > tolerated.) > > > > > > - milind > > > > > > On 9/7/11 11:27 AM, "Mahadev Konar" <[EMAIL PROTECTED]> wrote: > > > > > > >I like the idea of having tools as a seperate module and I dont think > > > >that it will be a dumping ground unless we choose to make one of it. > > > > > > > >+1 for hadoop tools module under trunk. > > > > > > > >thanks > > > >mahadev > > > > > > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur < > [EMAIL PROTECTED]> > > > >wrote: > > > >> Agreed, we should not have a dumping ground. IMO, what it would go > > into > > > >> hadoop-tools (i.e. distcp, streaming and someone could argue for > > > >>FsShell as |