|
Owen O'Malley
2008-08-06, 05:18
Dhruba Borthakur
2008-08-06, 05:38
Owen O'Malley
2008-08-06, 06:12
Samuel Guo
2008-08-06, 09:45
Doug Cutting
2008-08-06, 17:07
Dhruba Borthakur
2008-08-06, 17:19
Dhruba Borthakur
2008-08-06, 20:29
Arun C Murthy
2008-08-06, 20:36
lohit
2008-08-06, 20:48
Doug Cutting
2008-08-07, 21:30
Nigel Daley
2008-08-07, 23:22
Dhruba Borthakur
2008-08-08, 05:24
Tom White
2008-08-08, 09:08
Doug Cutting
2008-08-08, 20:06
Nigel Daley
2008-08-16, 05:03
Nigel Daley
2008-09-06, 00:21
Dhruba Borthakur
2008-09-15, 22:03
Owen O'Malley
2008-09-15, 22:09
Tom White
2008-09-17, 09:43
|
-
[VOTE] Should we create sub-projects for HDFS and Map/Reduce?Owen O'Malley 2008-08-06, 05:18
I think the time has come to split Hadoop Core into three pieces:
1. Core (src/core) 2. HDFS (src/hdfs) 3. Map/Reduce (src/mapred) There will be lots of details to work out, such as what we do with tools and contrib, but I think it is a good idea. This will create separate jiras and mailing lists for HDFS and map/reduce, which will make the community much more approachable. I would propose that we wait until 0.19.0 is released to give us time to plan the split. -- Owen
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Dhruba Borthakur 2008-08-06, 05:38
Are you talking about sub-projects for core, hdfs and mapreduce? Or is
there another way to allow for having separate mailing lists/jiras for these components? I had liked the fact that these pieces are together. It makes the code compile together, keeps API upgradation simple and enhances developer community building across all these three pieces of code. Actually, JIRAs have their own components and we can always filter them using their component, can't we? thanks, dhruba On Tue, Aug 5, 2008 at 9:18 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > I think the time has come to split Hadoop Core into three pieces: > > 1. Core (src/core) > 2. HDFS (src/hdfs) > 3. Map/Reduce (src/mapred) > > There will be lots of details to work out, such as what we do with tools and > contrib, but I think it is a good idea. This will create separate jiras and > mailing lists for HDFS and map/reduce, which will make the community much > more approachable. I would propose that we wait until 0.19.0 is released to > give us time to plan the split. > > -- Owen >
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Owen O'Malley 2008-08-06, 06:12
On Aug 5, 2008, at 10:38 PM, Dhruba Borthakur wrote: > Are you talking about sub-projects for core, hdfs and mapreduce? Sorry, I wasn't clear. Yes, I mean creating new sub-projects. There will clearly be overlap between the sub-projects in terms of committers. In fact, all current Hadoop committers would be on all 3. It just feels like getting 100+ emails a day is overwhelming and most of the developers specialize on one side or the other. So the division seems pretty natural to me. -- Owen
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Samuel Guo 2008-08-06, 09:45
Owen O'Malley wrote:
> > On Aug 5, 2008, at 10:38 PM, Dhruba Borthakur wrote: > >> Are you talking about sub-projects for core, hdfs and mapreduce? > > Sorry, I wasn't clear. Yes, I mean creating new sub-projects. There > will clearly be overlap between the sub-projects in terms of > committers. In fact, all current Hadoop committers would be on all 3. > It just feels like getting 100+ emails a day is overwhelming and most > of the developers specialize on one side or the other. So the division > seems pretty natural to me. > > -- Owen It sounds a good idea.
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Doug Cutting 2008-08-06, 17:07
+1
I agree that it is time to do this. Should we start using Ivy, so that the inter-dependencies are easier to manage? Doug Owen O'Malley wrote: > I think the time has come to split Hadoop Core into three pieces: > > 1. Core (src/core) > 2. HDFS (src/hdfs) > 3. Map/Reduce (src/mapred) > > There will be lots of details to work out, such as what we do with tools > and contrib, but I think it is a good idea. This will create separate > jiras and mailing lists for HDFS and map/reduce, which will make the > community much more approachable. I would propose that we wait until > 0.19.0 is released to give us time to plan the split. > > -- Owen
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Dhruba Borthakur 2008-08-06, 17:19
+1.
-dhruba On Wed, Aug 6, 2008 at 10:07 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > +1 > > I agree that it is time to do this. Should we start using Ivy, so that the > inter-dependencies are easier to manage? > > Doug > > Owen O'Malley wrote: >> >> I think the time has come to split Hadoop Core into three pieces: >> >> 1. Core (src/core) >> 2. HDFS (src/hdfs) >> 3. Map/Reduce (src/mapred) >> >> There will be lots of details to work out, such as what we do with tools >> and contrib, but I think it is a good idea. This will create separate jiras >> and mailing lists for HDFS and map/reduce, which will make the community >> much more approachable. I would propose that we wait until 0.19.0 is >> released to give us time to plan the split. >> >> -- Owen >
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Dhruba Borthakur 2008-08-06, 20:29
What about releases? Does this mean that each sub-project will be
released separately? If so, then the life of an administrator becomes even more harder :-). he has to pick and choose each package, verify whether they are compatible with one another, run various installation utilities to install them, etc.etc. -dhruba On Wed, Aug 6, 2008 at 10:19 AM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote: > +1. > > -dhruba > > On Wed, Aug 6, 2008 at 10:07 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> +1 >> >> I agree that it is time to do this. Should we start using Ivy, so that the >> inter-dependencies are easier to manage? >> >> Doug >> >> Owen O'Malley wrote: >>> >>> I think the time has come to split Hadoop Core into three pieces: >>> >>> 1. Core (src/core) >>> 2. HDFS (src/hdfs) >>> 3. Map/Reduce (src/mapred) >>> >>> There will be lots of details to work out, such as what we do with tools >>> and contrib, but I think it is a good idea. This will create separate jiras >>> and mailing lists for HDFS and map/reduce, which will make the community >>> much more approachable. I would propose that we wait until 0.19.0 is >>> released to give us time to plan the split. >>> >>> -- Owen >> >
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Arun C Murthy 2008-08-06, 20:36
+1
Arun >>> Owen O'Malley wrote: >>>> >>>> I think the time has come to split Hadoop Core into three pieces: >>>> >>>> 1. Core (src/core) >>>> 2. HDFS (src/hdfs) >>>> 3. Map/Reduce (src/mapred) >>>> >>>> There will be lots of details to work out, such as what we do >>>> with tools >>>> and contrib, but I think it is a good idea. This will create >>>> separate jiras >>>> and mailing lists for HDFS and map/reduce, which will make the >>>> community >>>> much more approachable. I would propose that we wait until 0.19.0 >>>> is >>>> released to give us time to plan the split. >>>> >>>> -- Owen >>> >>
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?lohit 2008-08-06, 20:48
On similar note, would Core continue to have things not part of HDFS and Map/Reduce? Would it still be called core. Map/reduce and HDFS are supposed to be 'core' of hadoop, right :)
-Lohit ----- Original Message ---- From: Arun C Murthy <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wednesday, August 6, 2008 1:36:15 PM Subject: Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce? +1 Arun >>> Owen O'Malley wrote: >>>> >>>> I think the time has come to split Hadoop Core into three pieces: >>>> >>>> 1. Core (src/core) >>>> 2. HDFS (src/hdfs) >>>> 3. Map/Reduce (src/mapred) >>>> >>>> There will be lots of details to work out, such as what we do >>>> with tools >>>> and contrib, but I think it is a good idea. This will create >>>> separate jiras >>>> and mailing lists for HDFS and map/reduce, which will make the >>>> community >>>> much more approachable. I would propose that we wait until 0.19.0 >>>> is >>>> released to give us time to plan the split. >>>> >>>> -- Owen >>> >>
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Doug Cutting 2008-08-07, 21:30
Dhruba Borthakur wrote:
> What about releases? Does this mean that each sub-project will be > released separately? Yes. Although we might coordinate and release in waves. It would increase the importance of back-compatibility. > If so, then the life of an administrator becomes > even more harder :-). he has to pick and choose each package, verify > whether they are compatible with one another, run various installation > utilities to install them, etc.etc. We intend to use Ivy assist with compatibility. It would be silly release a new X version A that requires Y version B if Y version B has not yet been released. So upgrading should not be any harder than it is today: to upgrade mapreduce you might need to upgrade hdfs to a compatible version. Today you always have to upgrade both. So, in some cases, upgrades should get easier, since not everything need be upgraded at once. Doug
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Nigel Daley 2008-08-07, 23:22
So we'll need to create and maintain 3 patch processes, one for each
component? Not a trivial amount of work given the way the patch process is currently structured. How will unit tests be divided? For instance, will all three have to have MiniDFSCluster and other shared test infrastructure? We can use Ivy now to manage dependencies on outside libraries. We can build separate jars for mapred, hdfs, and core right now. We can use email filters to reduce inbox emails. We can use TestNG to categorize our tests and narrow the number of unit tests run for each component. -1 until I better understand the benefit of making the split. Nige On Aug 5, 2008, at 10:18 PM, Owen O'Malley wrote: > I think the time has come to split Hadoop Core into three pieces: > > 1. Core (src/core) > 2. HDFS (src/hdfs) > 3. Map/Reduce (src/mapred) > > There will be lots of details to work out, such as what we do with > tools and contrib, but I think it is a good idea. This will create > separate jiras and mailing lists for HDFS and map/reduce, which will > make the community much more approachable. I would propose that we > wait until 0.19.0 is released to give us time to plan the split. > > -- Owen
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Dhruba Borthakur 2008-08-08, 05:24
I too am "-1" on this one. I think we should try to see if some of the
pieces in contrib deserve being a subproject but we can keep hdfs and map-reduce together. I think it reduces complexity from a deployment perspective too. If we can use the "component" of the JIRA in the subject of the email, then it leads itself to very easy email filtering. Does anyone know (or advice me) on how to make the "component" be part of the email subject? thanks, dhruba On Thu, Aug 7, 2008 at 3:22 PM, Nigel Daley <[EMAIL PROTECTED]> wrote: > So we'll need to create and maintain 3 patch processes, one for each > component? Not a trivial amount of work given the way the patch process is > currently structured. > > How will unit tests be divided? For instance, will all three have to have > MiniDFSCluster and other shared test infrastructure? > > We can use Ivy now to manage dependencies on outside libraries. > We can build separate jars for mapred, hdfs, and core right now. > We can use email filters to reduce inbox emails. > We can use TestNG to categorize our tests and narrow the number of unit > tests run for each component. > > -1 until I better understand the benefit of making the split. > > Nige > > On Aug 5, 2008, at 10:18 PM, Owen O'Malley wrote: > >> I think the time has come to split Hadoop Core into three pieces: >> >> 1. Core (src/core) >> 2. HDFS (src/hdfs) >> 3. Map/Reduce (src/mapred) >> >> There will be lots of details to work out, such as what we do with tools >> and contrib, but I think it is a good idea. This will create separate jiras >> and mailing lists for HDFS and map/reduce, which will make the community >> much more approachable. I would propose that we wait until 0.19.0 is >> released to give us time to plan the split. >> >> -- Owen > >
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Tom White 2008-08-08, 09:08
+1
I was initially concerned about the overhead of having to install separate packages for each component, but in some ways it will make things clearer. Folks on the user list are often asking how to use HDFS by itself for instance - or even if it is possible. By splitting it up it would make it clear that HDFS and MapReduce can be used without the other (although of course, they are best used together). Also, I can see some benefit from having separate configuration for HDFS and MapReduce, since it will make the configuration files smaller and more manageable (something like hdfs-(default|site).xml, mapreduce-(default|site).xml). It's not totally clear to me how Core fits into this. It's just a jar file and doesn't have daemons, so it should be bundled with the MapReduce and HDFS releases, shouldn't it? Nigel Daley <[EMAIL PROTECTED]> wrote: > How will unit tests be divided? For instance, will all three have to have > MiniDFSCluster and other shared test infrastructure? Today the tests for core, hdfs and mapred are under one source tree because they are so tightly intertwined. I think the goal should be to have independent unit tests for each module, as well as integration tests that test that MapReduce works with HDFS. We should do this even if we don't split the projects. Tom
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Doug Cutting 2008-08-08, 20:06
Nigel Daley wrote:
> How will unit tests be divided? For instance, will all three have to > have MiniDFSCluster and other shared test infrastructure? The HDFS project can release an hdfs-test.jar file that contains MiniDFSCluster. This will be used by mapred tests. Similarly, mapred will release a mapred-test.jar that contains MiniMRCluster, which can be used by hdfs tests. There is a circular dependency, but only in the test code, not in the mapred or hdfs code itself. This is easy to enforce, since test code is not on the classpath when we compile non-test code. > -1 until I better understand the benefit of making the split. One benefit is that developers would spend less time reading messages about areas they're not interested in. The core-dev mailing list traffic is becoming unmanageable. Splitting these without splitting the project would mean that a split developer community would attempt to build a coherent product, which sounds dangerous. Another benefit is that it would increase the separation of these technologies, so that, e.g., folks could more easily run different versions of mapreduce on top of different versions of HDFS. Currently we make no such guarantees. Folks would be able to upgrade to, e.g., the next release of mapreduce on a subset of their cluster without upgrading their HDFS. That's not currently supported. As we move towards splitting mapreduce into a scheduler and runtime, where folks can specify a different runtime per job, this will be even more critical. We need to make this split eventually. Why not now? Doug
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Nigel Daley 2008-08-16, 05:03
> Another benefit is that it would increase the separation of these
> technologies, so that, e.g., folks could more easily run different > versions of mapreduce on top of different versions of HDFS. > Currently we make no such guarantees. Folks would be able to > upgrade to, e.g., the next release of mapreduce on a subset of their > cluster without upgrading their HDFS. That's not currently > supported. As we move towards splitting mapreduce into a scheduler > and runtime, where folks can specify a different runtime per job, > this will be even more critical. Sounds like we simply need to create separate jar files for these different components. This can be done in the current project. Wouldn't the amount of effort to make this split and get it right be better spent on getting all components of Hadoop to 1.0 (API stability)? The proposal feels like a distraction to me at this point in the project. Nige
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Nigel Daley 2008-09-06, 00:21
On Aug 15, 2008, at 10:03 PM, Nigel Daley wrote: >> Another benefit is that it would increase the separation of these >> technologies, so that, e.g., folks could more easily run different >> versions of mapreduce on top of different versions of HDFS. >> Currently we make no such guarantees. Folks would be able to >> upgrade to, e.g., the next release of mapreduce on a subset of >> their cluster without upgrading their HDFS. That's not currently >> supported. As we move towards splitting mapreduce into a scheduler >> and runtime, where folks can specify a different runtime per job, >> this will be even more critical. > > Sounds like we simply need to create separate jar files for these > different components. This can be done in the current project. > > Wouldn't the amount of effort to make this split and get it right be > better spent on getting all components of Hadoop to 1.0 (API > stability)? The proposal feels like a distraction to me at this > point in the project. > > Nige I'd like to retract the -1 vote that I gave this proposal earlier. One compelling reason (for me) to split HDFS and Map/Reduce into separate sub-projects is that (hopefully) the *configs* for each layer will be clearer and simpler. So I'm now +1 on this proposal. Nige
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Dhruba Borthakur 2008-09-15, 22:03
+1
I would prefer to keep hdfs and mapreduce together because I believe that this arrangement catches incompatability sooner that later. But with the coming of age of both these modules separately, I guess it is time to grow them on their own individual turf! thanks, dhruba On Fri, Sep 5, 2008 at 5:21 PM, Nigel Daley <[EMAIL PROTECTED]> wrote: > > On Aug 15, 2008, at 10:03 PM, Nigel Daley wrote: > >>> Another benefit is that it would increase the separation of these >>> technologies, so that, e.g., folks could more easily run different versions >>> of mapreduce on top of different versions of HDFS. Currently we make no >>> such guarantees. Folks would be able to upgrade to, e.g., the next release >>> of mapreduce on a subset of their cluster without upgrading their HDFS. >>> That's not currently supported. As we move towards splitting mapreduce >>> into a scheduler and runtime, where folks can specify a different runtime >>> per job, this will be even more critical. >> >> Sounds like we simply need to create separate jar files for these >> different components. This can be done in the current project. >> >> Wouldn't the amount of effort to make this split and get it right be >> better spent on getting all components of Hadoop to 1.0 (API stability)? >> The proposal feels like a distraction to me at this point in the project. >> >> Nige > > I'd like to retract the -1 vote that I gave this proposal earlier. One > compelling reason (for me) to split HDFS and Map/Reduce into separate > sub-projects is that (hopefully) the *configs* for each layer will be > clearer and simpler. > > So I'm now +1 on this proposal. > > Nige >
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Owen O'Malley 2008-09-15, 22:09
Ok, so in very protracted voting, the results are:
PMC +1's: Arun, Dhruba, Doug, Nigel, Owen, Tom So the vote passes. -- Owen
-
Re: [VOTE] Should we create sub-projects for HDFS and Map/Reduce?Tom White 2008-09-17, 09:43
BTW The initial work to sort out the module dependencies (but not
actually split the projects) is being carried out in https://issues.apache.org/jira/browse/HADOOP-3750. Tom On Mon, Sep 15, 2008 at 11:09 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > Ok, so in very protracted voting, the results are: > > PMC +1's: Arun, Dhruba, Doug, Nigel, Owen, Tom > > So the vote passes. > > -- Owen > |