|
Amareshwari Sri Ramadasu
2011-08-25, 04:51
Jean-Daniel Cryans
2011-08-25, 16:56
Eli Collins
2011-08-25, 17:36
Todd Lipcon
2011-08-25, 17:58
Alejandro Abdelnur
2011-08-25, 18:04
Mahadev Konar
2011-08-25, 19:09
Todd Lipcon
2011-08-25, 19:21
Amareshwari Sri Ramadasu
2011-08-26, 04:06
Alejandro Abdelnur
2011-08-26, 04:15
Mahadev Konar
2011-08-26, 04:17
Giridharan Kesavan
2011-08-26, 06:34
Mithun Radhakrishnan
2011-08-26, 12:45
Robert Evans
2011-08-26, 14:15
Eli Collins
2011-08-26, 14:39
Amareshwari Sri Ramadasu
2011-08-26, 16:37
Alejandro Abdelnur
2011-08-26, 16:47
Alejandro Abdelnur
2011-08-26, 16:48
Mithun Radhakrishnan
2011-08-26, 17:17
|
-
DistCpV2 in 0.23Amareshwari Sri Ramadasu 2011-08-25, 04:51
Hi,
As you would have seen, DistCpV2 is up on https://issues.apache.org/jira/browse/MAPREDUCE-2765. The code will go into a new contrib project. It has full unit test coverage and proper documentation. I really liked this part of the patch. :) The patch has been reviewed and ready for commit. Would like to have it in 0.23 branch. Since it is all new code, I think it should be fine. Moreover, the code is in production in Yahoo! since six months. Let me know if you have in any issues. Clearly +1 for putting this in 0.23 Thanks Amareshwari
-
Re: DistCpV2 in 0.23Jean-Daniel Cryans 2011-08-25, 16:56
Contribs are hard to follow and maintain, if this is really a rewrite
shouldn't it be in the core code? BTW nice to see that distcp is being reengineered, love the new features listed in that jira. Thx, J-D On Wed, Aug 24, 2011 at 9:51 PM, Amareshwari Sri Ramadasu <[EMAIL PROTECTED]> wrote: > Hi, > > As you would have seen, DistCpV2 is up on https://issues.apache.org/jira/browse/MAPREDUCE-2765. The code will go into a new contrib project. It has full unit test coverage and proper documentation. I really liked this part of the patch. :) > The patch has been reviewed and ready for commit. Would like to have it in 0.23 branch. Since it is all new code, I think it should be fine. Moreover, the code is in production in Yahoo! since six months. > > Let me know if you have in any issues. > > Clearly +1 for putting this in 0.23 > > Thanks > Amareshwari >
-
Re: DistCpV2 in 0.23Eli Collins 2011-08-25, 17:36
Nice work! I definitely think this should go in 23 and 20x.
Agree with JD that it should be in the core code, not contrib. If it's going to be maintained then we should put it in the core code. Thanks, Eli On Wed, Aug 24, 2011 at 9:51 PM, Amareshwari Sri Ramadasu <[EMAIL PROTECTED]> wrote: > Hi, > > As you would have seen, DistCpV2 is up on https://issues.apache.org/jira/browse/MAPREDUCE-2765. The code will go into a new contrib project. It has full unit test coverage and proper documentation. I really liked this part of the patch. :) > The patch has been reviewed and ready for commit. Would like to have it in 0.23 branch. Since it is all new code, I think it should be fine. Moreover, the code is in production in Yahoo! since six months. > > Let me know if you have in any issues. > > Clearly +1 for putting this in 0.23 > > Thanks > Amareshwari >
-
Re: DistCpV2 in 0.23Todd Lipcon 2011-08-25, 17:58
On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> wrote:
> Nice work! I definitely think this should go in 23 and 20x. > > Agree with JD that it should be in the core code, not contrib. If > it's going to be maintained then we should put it in the core code. Now that we're all mavenized, though, a separate maven module and artifact does make sense IMO - ie "hadoop jar hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: DistCpV2 in 0.23Alejandro Abdelnur 2011-08-25, 18:04
Agree, it should be a separate maven module.
And it should be under hadoop-mapreduce-client, right? And now that we are in the topic, the same should go for streaming, no? Thanks. Alejandro On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> wrote: > > Nice work! I definitely think this should go in 23 and 20x. > > > > Agree with JD that it should be in the core code, not contrib. If > > it's going to be maintained then we should put it in the core code. > > Now that we're all mavenized, though, a separate maven module and > artifact does make sense IMO - ie "hadoop jar > hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera >
-
Re: DistCpV2 in 0.23Mahadev Konar 2011-08-25, 19:09
+1 for a seperate module in hadoop-mapreduce-project. I think
hadoop-mapreduce-client might not be right place for it. We might have to pick a new maven module under hadoop-mapreduce-project that could host streaming/distcp/hadoop archives. thanks mahadev On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > Agree, it should be a separate maven module. > > And it should be under hadoop-mapreduce-client, right? > > And now that we are in the topic, the same should go for streaming, no? > > Thanks. > > Alejandro > > On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > >> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> wrote: >> > Nice work! I definitely think this should go in 23 and 20x. >> > >> > Agree with JD that it should be in the core code, not contrib. If >> > it's going to be maintained then we should put it in the core code. >> >> Now that we're all mavenized, though, a separate maven module and >> artifact does make sense IMO - ie "hadoop jar >> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >> >> -Todd >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >
-
Re: DistCpV2 in 0.23Todd Lipcon 2011-08-25, 19:21
Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
in there as well - ie tools that are downstream of MR and/or HDFS. On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> wrote: > +1 for a seperate module in hadoop-mapreduce-project. I think > hadoop-mapreduce-client might not be right place for it. We might have > to pick a new maven module under hadoop-mapreduce-project that could > host streaming/distcp/hadoop archives. > > thanks > mahadev > > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: >> Agree, it should be a separate maven module. >> >> And it should be under hadoop-mapreduce-client, right? >> >> And now that we are in the topic, the same should go for streaming, no? >> >> Thanks. >> >> Alejandro >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> wrote: >>> > Nice work! I definitely think this should go in 23 and 20x. >>> > >>> > Agree with JD that it should be in the core code, not contrib. If >>> > it's going to be maintained then we should put it in the core code. >>> >>> Now that we're all mavenized, though, a separate maven module and >>> artifact does make sense IMO - ie "hadoop jar >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >>> >>> -Todd >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> > -- Todd Lipcon Software Engineer, Cloudera
-
Re: DistCpV2 in 0.23Amareshwari Sri Ramadasu 2011-08-26, 04:06
Agree. It should be separate maven module (and patch puts it as separate maven module now). And top level for hadoop tools is nice to have, but it becomes hard to maintain until patch automation tests run the tests under tools. Currently we see many times the changes in HDFS effecting RAID tests in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
I propose we can have something like the following: trunk/ - hadoop-mapreduce - hadoop-mr-client - hadoop-yarn - hadoop-tools - hadoop-streaming - hadoop-archives - hadoop-distcp Thoughts? @Eli and @JD, we did not replace old legacy distcp because this is really a complete rewrite and did not want to remove it until users are familiarized with new one. On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go in there as well - ie tools that are downstream of MR and/or HDFS. On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> wrote: > +1 for a seperate module in hadoop-mapreduce-project. I think > hadoop-mapreduce-client might not be right place for it. We might have > to pick a new maven module under hadoop-mapreduce-project that could > host streaming/distcp/hadoop archives. > > thanks > mahadev > > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: >> Agree, it should be a separate maven module. >> >> And it should be under hadoop-mapreduce-client, right? >> >> And now that we are in the topic, the same should go for streaming, no? >> >> Thanks. >> >> Alejandro >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> wrote: >>> > Nice work! I definitely think this should go in 23 and 20x. >>> > >>> > Agree with JD that it should be in the core code, not contrib. If >>> > it's going to be maintained then we should put it in the core code. >>> >>> Now that we're all mavenized, though, a separate maven module and >>> artifact does make sense IMO - ie "hadoop jar >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >>> >>> -Todd >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> > -- Todd Lipcon Software Engineer, Cloudera
-
Re: DistCpV2 in 0.23Alejandro Abdelnur 2011-08-26, 04:15
I'd suggest putting hadoop-tools either at trunk/ level or having a a tools
aggregator module for hdfs and other for common. I personal would prefer at trunk/. Thanks. Alejandro On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < [EMAIL PROTECTED]> wrote: > Agree. It should be separate maven module (and patch puts it as separate > maven module now). And top level for hadoop tools is nice to have, but it > becomes hard to maintain until patch automation tests run the tests under > tools. Currently we see many times the changes in HDFS effecting RAID tests > in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. > > I propose we can have something like the following: > > trunk/ > - hadoop-mapreduce > - hadoop-mr-client > - hadoop-yarn > - hadoop-tools > - hadoop-streaming > - hadoop-archives > - hadoop-distcp > > Thoughts? > > @Eli and @JD, we did not replace old legacy distcp because this is really a > complete rewrite and did not want to remove it until users are familiarized > with new one. > > On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: > > Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go > in there as well - ie tools that are downstream of MR and/or HDFS. > > On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> > wrote: > > +1 for a seperate module in hadoop-mapreduce-project. I think > > hadoop-mapreduce-client might not be right place for it. We might have > > to pick a new maven module under hadoop-mapreduce-project that could > > host streaming/distcp/hadoop archives. > > > > thanks > > mahadev > > > > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> > wrote: > >> Agree, it should be a separate maven module. > >> > >> And it should be under hadoop-mapreduce-client, right? > >> > >> And now that we are in the topic, the same should go for streaming, no? > >> > >> Thanks. > >> > >> Alejandro > >> > >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> > wrote: > >> > >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> > wrote: > >>> > Nice work! I definitely think this should go in 23 and 20x. > >>> > > >>> > Agree with JD that it should be in the core code, not contrib. If > >>> > it's going to be maintained then we should put it in the core code. > >>> > >>> Now that we're all mavenized, though, a separate maven module and > >>> artifact does make sense IMO - ie "hadoop jar > >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" > >>> > >>> -Todd > >>> -- > >>> Todd Lipcon > >>> Software Engineer, Cloudera > >>> > >> > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > >
-
Re: DistCpV2 in 0.23Mahadev Konar 2011-08-26, 04:17
+1 for the layout!
thanks mahadev On Aug 25, 2011, at 9:06 PM, Amareshwari Sri Ramadasu wrote: > Agree. It should be separate maven module (and patch puts it as separate maven module now). And top level for hadoop tools is nice to have, but it becomes hard to maintain until patch automation tests run the tests under tools. Currently we see many times the changes in HDFS effecting RAID tests in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. > > I propose we can have something like the following: > > trunk/ > - hadoop-mapreduce > - hadoop-mr-client > - hadoop-yarn > - hadoop-tools > - hadoop-streaming > - hadoop-archives > - hadoop-distcp > > Thoughts? > > @Eli and @JD, we did not replace old legacy distcp because this is really a complete rewrite and did not want to remove it until users are familiarized with new one. > > On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: > > Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go > in there as well - ie tools that are downstream of MR and/or HDFS. > > On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> wrote: >> +1 for a seperate module in hadoop-mapreduce-project. I think >> hadoop-mapreduce-client might not be right place for it. We might have >> to pick a new maven module under hadoop-mapreduce-project that could >> host streaming/distcp/hadoop archives. >> >> thanks >> mahadev >> >> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: >>> Agree, it should be a separate maven module. >>> >>> And it should be under hadoop-mapreduce-client, right? >>> >>> And now that we are in the topic, the same should go for streaming, no? >>> >>> Thanks. >>> >>> Alejandro >>> >>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >>> >>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> wrote: >>>>> Nice work! I definitely think this should go in 23 and 20x. >>>>> >>>>> Agree with JD that it should be in the core code, not contrib. If >>>>> it's going to be maintained then we should put it in the core code. >>>> >>>> Now that we're all mavenized, though, a separate maven module and >>>> artifact does make sense IMO - ie "hadoop jar >>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >>>> >>>> -Todd >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>> >>> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
-
Re: DistCpV2 in 0.23Giridharan Kesavan 2011-08-26, 06:34
+1 to Alejandro's
I prefer to keep the hadoop-tools at trunk level. -Giri On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > I'd suggest putting hadoop-tools either at trunk/ level or having a a tools > aggregator module for hdfs and other for common. > > I personal would prefer at trunk/. > > Thanks. > > Alejandro > > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Agree. It should be separate maven module (and patch puts it as separate >> maven module now). And top level for hadoop tools is nice to have, but it >> becomes hard to maintain until patch automation tests run the tests under >> tools. Currently we see many times the changes in HDFS effecting RAID tests >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >> >> I propose we can have something like the following: >> >> trunk/ >> - hadoop-mapreduce >> - hadoop-mr-client >> - hadoop-yarn >> - hadoop-tools >> - hadoop-streaming >> - hadoop-archives >> - hadoop-distcp >> >> Thoughts? >> >> @Eli and @JD, we did not replace old legacy distcp because this is really a >> complete rewrite and did not want to remove it until users are familiarized >> with new one. >> >> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: >> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go >> in there as well - ie tools that are downstream of MR and/or HDFS. >> >> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> >> wrote: >> > +1 for a seperate module in hadoop-mapreduce-project. I think >> > hadoop-mapreduce-client might not be right place for it. We might have >> > to pick a new maven module under hadoop-mapreduce-project that could >> > host streaming/distcp/hadoop archives. >> > >> > thanks >> > mahadev >> > >> > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >> >> Agree, it should be a separate maven module. >> >> >> >> And it should be under hadoop-mapreduce-client, right? >> >> >> >> And now that we are in the topic, the same should go for streaming, no? >> >> >> >> Thanks. >> >> >> >> Alejandro >> >> >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> >> wrote: >> >>> > Nice work! I definitely think this should go in 23 and 20x. >> >>> > >> >>> > Agree with JD that it should be in the core code, not contrib. If >> >>> > it's going to be maintained then we should put it in the core code. >> >>> >> >>> Now that we're all mavenized, though, a separate maven module and >> >>> artifact does make sense IMO - ie "hadoop jar >> >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >> >>> >> >>> -Todd >> >>> -- >> >>> Todd Lipcon >> >>> Software Engineer, Cloudera >> >>> >> >> >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >> > -- -Giri
-
Re: DistCpV2 in 0.23Mithun Radhakrishnan 2011-08-26, 12:45
Would it be acceptable if retooling of tools/ were taken up separately? It sounds to me like this might be a distinct (albeit related) task.
Mithun ________________________________ From: Giridharan Kesavan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Friday, August 26, 2011 12:04 PM Subject: Re: DistCpV2 in 0.23 +1 to Alejandro's I prefer to keep the hadoop-tools at trunk level. -Giri On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > I'd suggest putting hadoop-tools either at trunk/ level or having a a tools > aggregator module for hdfs and other for common. > > I personal would prefer at trunk/. > > Thanks. > > Alejandro > > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Agree. It should be separate maven module (and patch puts it as separate >> maven module now). And top level for hadoop tools is nice to have, but it >> becomes hard to maintain until patch automation tests run the tests under >> tools. Currently we see many times the changes in HDFS effecting RAID tests >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >> >> I propose we can have something like the following: >> >> trunk/ >> - hadoop-mapreduce >> - hadoop-mr-client >> - hadoop-yarn >> - hadoop-tools >> - hadoop-streaming >> - hadoop-archives >> - hadoop-distcp >> >> Thoughts? >> >> @Eli and @JD, we did not replace old legacy distcp because this is really a >> complete rewrite and did not want to remove it until users are familiarized >> with new one. >> >> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: >> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go >> in there as well - ie tools that are downstream of MR and/or HDFS. >> >> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> >> wrote: >> > +1 for a seperate module in hadoop-mapreduce-project. I think >> > hadoop-mapreduce-client might not be right place for it. We might have >> > to pick a new maven module under hadoop-mapreduce-project that could >> > host streaming/distcp/hadoop archives. >> > >> > thanks >> > mahadev >> > >> > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >> >> Agree, it should be a separate maven module. >> >> >> >> And it should be under hadoop-mapreduce-client, right? >> >> >> >> And now that we are in the topic, the same should go for streaming, no? >> >> >> >> Thanks. >> >> >> >> Alejandro >> >> >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> >> wrote: >> >>> > Nice work! I definitely think this should go in 23 and 20x. >> >>> > >> >>> > Agree with JD that it should be in the core code, not contrib. If >> >>> > it's going to be maintained then we should put it in the core code. >> >>> >> >>> Now that we're all mavenized, though, a separate maven module and >> >>> artifact does make sense IMO - ie "hadoop jar >> >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >> >>> >> >>> -Todd >> >>> -- >> >>> Todd Lipcon >> >>> Software Engineer, Cloudera >> >>> >> >> >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >> > -- -Giri
-
Re: DistCpV2 in 0.23Robert Evans 2011-08-26, 14:15
I agree with Mithun. They are related but this goes beyond distcpv2 and should not block distcpv2 from going in. It would be very nice, however, to get the layout settled soon so that we all know where to find something when we want to work on it.
Also +1 for Alejandro's I also prefer to keep tools at the trunk level. Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate modules right now, there is still tight coupling between the different pieces, especially with tests. IMO until we can reduce that coupling we should treat building and testing Hadoop as a single project instead of trying to keep them separate. --Bobby On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <[EMAIL PROTECTED]> wrote: Would it be acceptable if retooling of tools/ were taken up separately? It sounds to me like this might be a distinct (albeit related) task. Mithun ________________________________ From: Giridharan Kesavan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Friday, August 26, 2011 12:04 PM Subject: Re: DistCpV2 in 0.23 +1 to Alejandro's I prefer to keep the hadoop-tools at trunk level. -Giri On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > I'd suggest putting hadoop-tools either at trunk/ level or having a a tools > aggregator module for hdfs and other for common. > > I personal would prefer at trunk/. > > Thanks. > > Alejandro > > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Agree. It should be separate maven module (and patch puts it as separate >> maven module now). And top level for hadoop tools is nice to have, but it >> becomes hard to maintain until patch automation tests run the tests under >> tools. Currently we see many times the changes in HDFS effecting RAID tests >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >> >> I propose we can have something like the following: >> >> trunk/ >> - hadoop-mapreduce >> - hadoop-mr-client >> - hadoop-yarn >> - hadoop-tools >> - hadoop-streaming >> - hadoop-archives >> - hadoop-distcp >> >> Thoughts? >> >> @Eli and @JD, we did not replace old legacy distcp because this is really a >> complete rewrite and did not want to remove it until users are familiarized >> with new one. >> >> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: >> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go >> in there as well - ie tools that are downstream of MR and/or HDFS. >> >> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> >> wrote: >> > +1 for a seperate module in hadoop-mapreduce-project. I think >> > hadoop-mapreduce-client might not be right place for it. We might have >> > to pick a new maven module under hadoop-mapreduce-project that could >> > host streaming/distcp/hadoop archives. >> > >> > thanks >> > mahadev >> > >> > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >> >> Agree, it should be a separate maven module. >> >> >> >> And it should be under hadoop-mapreduce-client, right? >> >> >> >> And now that we are in the topic, the same should go for streaming, no? >> >> >> >> Thanks. >> >> >> >> Alejandro >> >> >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> >> wrote: >> >>> > Nice work! I definitely think this should go in 23 and 20x. >> >>> > >> >>> > Agree with JD that it should be in the core code, not contrib. If >> >>> > it's going to be maintained then we should put it in the core code. >> >>> >> >>> Now that we're all mavenized, though, a separate maven module and >> >>> artifact does make sense IMO - ie "hadoop jar >> >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >> >>> >> >>> -Todd >> >>> -- >> >>> Todd Lipcon >> >>> Software Engineer, Cloudera >> >>> >> >> >> > -Giri
-
Re: DistCpV2 in 0.23Eli Collins 2011-08-26, 14:39
On Friday, August 26, 2011, Amareshwari Sri Ramadasu <[EMAIL PROTECTED]>
wrote: > Agree. It should be separate maven module (and patch puts it as separate maven module now). And top level for hadoop tools is nice to have, but it becomes hard to maintain until patch automation tests run the tests under tools. Currently we see many times the changes in HDFS effecting RAID tests in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. > > I propose we can have something like the following: > > trunk/ > - hadoop-mapreduce > - hadoop-mr-client > - hadoop-yarn > - hadoop-tools > - hadoop-streaming > - hadoop-archives > - hadoop-distcp > > Thoughts? > > @Eli and @JD, we did not replace old legacy distcp because this is really a complete rewrite and did not want to remove it until users are familiarized with new one. That makes sense, we have a similar situation w hftp and hoop. The new distcp shouldn't be contrib is my only input. Thanks, Eli > > On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: > > Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go > in there as well - ie tools that are downstream of MR and/or HDFS. > > On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> wrote: >> +1 for a seperate module in hadoop-mapreduce-project. I think >> hadoop-mapreduce-client might not be right place for it. We might have >> to pick a new maven module under hadoop-mapreduce-project that could >> host streaming/distcp/hadoop archives. >> >> thanks >> mahadev >> >> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: >>> Agree, it should be a separate maven module. >>> >>> And it should be under hadoop-mapreduce-client, right? >>> >>> And now that we are in the topic, the same should go for streaming, no? >>> >>> Thanks. >>> >>> Alejandro >>> >>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >>> >>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> wrote: >>>> > Nice work! I definitely think this should go in 23 and 20x. >>>> > >>>> > Agree with JD that it should be in the core code, not contrib. If >>>> > it's going to be maintained then we should put it in the core code. >>>> >>>> Now that we're all mavenized, though, a separate maven module and >>>> artifact does make sense IMO - ie "hadoop jar >>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >>>> >>>> -Todd >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>> >>> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > >
-
Re: DistCpV2 in 0.23Amareshwari Sri Ramadasu 2011-08-26, 16:37
Agree with Mithun and Robert. DistCp and Tools restructuring are separate tasks. Since DistCp code is ready to be committed, it need not wait for the Tools separation from MR/HDFS.
I would say it can go into contrib as the patch is now, and when the tools restructuring happens it would be just an svn mv. If there are no issues with this proposal I can commit the code tomorrow. Thanks Amareshwari On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote: I agree with Mithun. They are related but this goes beyond distcpv2 and should not block distcpv2 from going in. It would be very nice, however, to get the layout settled soon so that we all know where to find something when we want to work on it. Also +1 for Alejandro's I also prefer to keep tools at the trunk level. Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate modules right now, there is still tight coupling between the different pieces, especially with tests. IMO until we can reduce that coupling we should treat building and testing Hadoop as a single project instead of trying to keep them separate. --Bobby On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <[EMAIL PROTECTED]> wrote: Would it be acceptable if retooling of tools/ were taken up separately? It sounds to me like this might be a distinct (albeit related) task. Mithun ________________________________ From: Giridharan Kesavan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Friday, August 26, 2011 12:04 PM Subject: Re: DistCpV2 in 0.23 +1 to Alejandro's I prefer to keep the hadoop-tools at trunk level. -Giri On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > I'd suggest putting hadoop-tools either at trunk/ level or having a a tools > aggregator module for hdfs and other for common. > > I personal would prefer at trunk/. > > Thanks. > > Alejandro > > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Agree. It should be separate maven module (and patch puts it as separate >> maven module now). And top level for hadoop tools is nice to have, but it >> becomes hard to maintain until patch automation tests run the tests under >> tools. Currently we see many times the changes in HDFS effecting RAID tests >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >> >> I propose we can have something like the following: >> >> trunk/ >> - hadoop-mapreduce >> - hadoop-mr-client >> - hadoop-yarn >> - hadoop-tools >> - hadoop-streaming >> - hadoop-archives >> - hadoop-distcp >> >> Thoughts? >> >> @Eli and @JD, we did not replace old legacy distcp because this is really a >> complete rewrite and did not want to remove it until users are familiarized >> with new one. >> >> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: >> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go >> in there as well - ie tools that are downstream of MR and/or HDFS. >> >> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <[EMAIL PROTECTED]> >> wrote: >> > +1 for a seperate module in hadoop-mapreduce-project. I think >> > hadoop-mapreduce-client might not be right place for it. We might have >> > to pick a new maven module under hadoop-mapreduce-project that could >> > host streaming/distcp/hadoop archives. >> > >> > thanks >> > mahadev >> > >> > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >> >> Agree, it should be a separate maven module. >> >> >> >> And it should be under hadoop-mapreduce-client, right? >> >> >> >> And now that we are in the topic, the same should go for streaming, no? >> >> >> >> Thanks. >> >> >> >> Alejandro >> >> >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <[EMAIL PROTECTED]> >> wrote: >> >>> > Nice work! I definitely think this should go in 23 and 20x. -Giri
-
Re: DistCpV2 in 0.23Alejandro Abdelnur 2011-08-26, 16:47
Please, don't add more Mavenization work on us (eventually I want to go back
to coding) Given that Hadoop is already Mavenized, the patch should be Mavenized. What will have to be done extra (besides Mavenizing distcp) is to create a hadoop-tools module at root level and within it a hadoop-distcp module. The hadoop-tools POM will look pretty much like the hadoop-common-project POM. The hadoop-distcp POM should follow the hadoop-common POM patterns. Thanks. Alejandro On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < [EMAIL PROTECTED]> wrote: > Agree with Mithun and Robert. DistCp and Tools restructuring are separate > tasks. Since DistCp code is ready to be committed, it need not wait for the > Tools separation from MR/HDFS. > I would say it can go into contrib as the patch is now, and when the tools > restructuring happens it would be just an svn mv. If there are no issues > with this proposal I can commit the code tomorrow. > > Thanks > Amareshwari > > On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote: > > I agree with Mithun. They are related but this goes beyond distcpv2 and > should not block distcpv2 from going in. It would be very nice, however, to > get the layout settled soon so that we all know where to find something when > we want to work on it. > > Also +1 for Alejandro's I also prefer to keep tools at the trunk level. > > Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate > modules right now, there is still tight coupling between the different > pieces, especially with tests. IMO until we can reduce that coupling we > should treat building and testing Hadoop as a single project instead of > trying to keep them separate. > > --Bobby > > On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <[EMAIL PROTECTED]> > wrote: > > Would it be acceptable if retooling of tools/ were taken up separately? It > sounds to me like this might be a distinct (albeit related) task. > > Mithun > > > ________________________________ > From: Giridharan Kesavan <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Friday, August 26, 2011 12:04 PM > Subject: Re: DistCpV2 in 0.23 > > +1 to Alejandro's > > I prefer to keep the hadoop-tools at trunk level. > > -Giri > > On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> > wrote: > > I'd suggest putting hadoop-tools either at trunk/ level or having a a > tools > > aggregator module for hdfs and other for common. > > > > I personal would prefer at trunk/. > > > > Thanks. > > > > Alejandro > > > > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < > > [EMAIL PROTECTED]> wrote: > > > >> Agree. It should be separate maven module (and patch puts it as separate > >> maven module now). And top level for hadoop tools is nice to have, but > it > >> becomes hard to maintain until patch automation tests run the tests > under > >> tools. Currently we see many times the changes in HDFS effecting RAID > tests > >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. > >> > >> I propose we can have something like the following: > >> > >> trunk/ > >> - hadoop-mapreduce > >> - hadoop-mr-client > >> - hadoop-yarn > >> - hadoop-tools > >> - hadoop-streaming > >> - hadoop-archives > >> - hadoop-distcp > >> > >> Thoughts? > >> > >> @Eli and @JD, we did not replace old legacy distcp because this is > really a > >> complete rewrite and did not want to remove it until users are > familiarized > >> with new one. > >> > >> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: > >> > >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go > >> in there as well - ie tools that are downstream of MR and/or HDFS. > >> > >> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar < > [EMAIL PROTECTED]> > >> wrote: > >> > +1 for a seperate module in hadoop-mapreduce-project. I think > >> > hadoop-mapreduce-client might not be right place for it. We might have
-
Re: DistCpV2 in 0.23Alejandro Abdelnur 2011-08-26, 16:48
And I'll be more than happy to review it from the Mavenization perspective.
Thxs. Alejandro On Fri, Aug 26, 2011 at 9:47 AM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote: > Please, don't add more Mavenization work on us (eventually I want to go > back to coding) > > Given that Hadoop is already Mavenized, the patch should be Mavenized. > > What will have to be done extra (besides Mavenizing distcp) is to create a > hadoop-tools module at root level and within it a hadoop-distcp module. > > The hadoop-tools POM will look pretty much like the hadoop-common-project > POM. > > The hadoop-distcp POM should follow the hadoop-common POM patterns. > > Thanks. > > Alejandro > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Agree with Mithun and Robert. DistCp and Tools restructuring are separate >> tasks. Since DistCp code is ready to be committed, it need not wait for the >> Tools separation from MR/HDFS. >> I would say it can go into contrib as the patch is now, and when the tools >> restructuring happens it would be just an svn mv. If there are no issues >> with this proposal I can commit the code tomorrow. >> >> Thanks >> Amareshwari >> >> On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote: >> >> I agree with Mithun. They are related but this goes beyond distcpv2 and >> should not block distcpv2 from going in. It would be very nice, however, to >> get the layout settled soon so that we all know where to find something when >> we want to work on it. >> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level. >> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are >> separate modules right now, there is still tight coupling between the >> different pieces, especially with tests. IMO until we can reduce that >> coupling we should treat building and testing Hadoop as a single project >> instead of trying to keep them separate. >> >> --Bobby >> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" < >> [EMAIL PROTECTED]> wrote: >> >> Would it be acceptable if retooling of tools/ were taken up separately? It >> sounds to me like this might be a distinct (albeit related) task. >> >> Mithun >> >> >> ________________________________ >> From: Giridharan Kesavan <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Friday, August 26, 2011 12:04 PM >> Subject: Re: DistCpV2 in 0.23 >> >> +1 to Alejandro's >> >> I prefer to keep the hadoop-tools at trunk level. >> >> -Giri >> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >> > I'd suggest putting hadoop-tools either at trunk/ level or having a a >> tools >> > aggregator module for hdfs and other for common. >> > >> > I personal would prefer at trunk/. >> > >> > Thanks. >> > >> > Alejandro >> > >> > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < >> > [EMAIL PROTECTED]> wrote: >> > >> >> Agree. It should be separate maven module (and patch puts it as >> separate >> >> maven module now). And top level for hadoop tools is nice to have, but >> it >> >> becomes hard to maintain until patch automation tests run the tests >> under >> >> tools. Currently we see many times the changes in HDFS effecting RAID >> tests >> >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >> >> >> >> I propose we can have something like the following: >> >> >> >> trunk/ >> >> - hadoop-mapreduce >> >> - hadoop-mr-client >> >> - hadoop-yarn >> >> - hadoop-tools >> >> - hadoop-streaming >> >> - hadoop-archives >> >> - hadoop-distcp >> >> >> >> Thoughts? >> >> >> >> @Eli and @JD, we did not replace old legacy distcp because this is >> really a >> >> complete rewrite and did not want to remove it until users are >> familiarized >> >> with new one. >> >> >> >> On 8/26/11 12:51 AM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: >> >> >> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
-
Re: DistCpV2 in 0.23Mithun Radhakrishnan 2011-08-26, 17:17
Greetings, Tucu. I'd like very much to take you up on that.
DistCpV2's build is currently mavenized. (Apologies. I neglected to mention that in this mail-thread.) Could I please bother you to review the pom? As the patch stands now, DistCpV2 needs building separately. Grazie, Mithun ________________________________ From: Alejandro Abdelnur <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Friday, August 26, 2011 10:18 PM Subject: Re: DistCpV2 in 0.23 And I'll be more than happy to review it from the Mavenization perspective. Thxs. Alejandro On Fri, Aug 26, 2011 at 9:47 AM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote: > Please, don't add more Mavenization work on us (eventually I want to go > back to coding) > > Given that Hadoop is already Mavenized, the patch should be Mavenized. > > What will have to be done extra (besides Mavenizing distcp) is to create a > hadoop-tools module at root level and within it a hadoop-distcp module. > > The hadoop-tools POM will look pretty much like the hadoop-common-project > POM. > > The hadoop-distcp POM should follow the hadoop-common POM patterns. > > Thanks. > > Alejandro > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < > [EMAIL PROTECTED]> wrote: > >> Agree with Mithun and Robert. DistCp and Tools restructuring are separate >> tasks. Since DistCp code is ready to be committed, it need not wait for the >> Tools separation from MR/HDFS. >> I would say it can go into contrib as the patch is now, and when the tools >> restructuring happens it would be just an svn mv. If there are no issues >> with this proposal I can commit the code tomorrow. >> >> Thanks >> Amareshwari >> >> On 8/26/11 7:45 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote: >> >> I agree with Mithun. They are related but this goes beyond distcpv2 and >> should not block distcpv2 from going in. It would be very nice, however, to >> get the layout settled soon so that we all know where to find something when >> we want to work on it. >> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level. >> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are >> separate modules right now, there is still tight coupling between the >> different pieces, especially with tests. IMO until we can reduce that >> coupling we should treat building and testing Hadoop as a single project >> instead of trying to keep them separate. >> >> --Bobby >> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" < >> [EMAIL PROTECTED]> wrote: >> >> Would it be acceptable if retooling of tools/ were taken up separately? It >> sounds to me like this might be a distinct (albeit related) task. >> >> Mithun >> >> >> ________________________________ >> From: Giridharan Kesavan <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Friday, August 26, 2011 12:04 PM >> Subject: Re: DistCpV2 in 0.23 >> >> +1 to Alejandro's >> >> I prefer to keep the hadoop-tools at trunk level. >> >> -Giri >> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> >> wrote: >> > I'd suggest putting hadoop-tools either at trunk/ level or having a a >> tools >> > aggregator module for hdfs and other for common. >> > >> > I personal would prefer at trunk/. >> > >> > Thanks. >> > >> > Alejandro >> > >> > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < >> > [EMAIL PROTECTED]> wrote: >> > >> >> Agree. It should be separate maven module (and patch puts it as >> separate >> >> maven module now). And top level for hadoop tools is nice to have, but >> it >> >> becomes hard to maintain until patch automation tests run the tests >> under >> >> tools. Currently we see many times the changes in HDFS effecting RAID >> tests >> >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >> >> >> >> I propose we can have something like the following: >> >> >> >> trunk/ >> >> - hadoop-mapreduce >> >> - hadoop-mr-client >> >> - hadoop-yarn >> >> - hadoop-tools |