|
Arun C Murthy
2012-07-26, 01:40
Edward J. Yoon
2012-07-26, 02:09
Mahadev Konar
2012-07-26, 04:44
Mattmann, Chris A
2012-07-26, 02:03
Arun C Murthy
2012-07-26, 02:11
Mattmann, Chris A
2012-07-26, 02:30
Aaron T. Myers
2012-07-26, 06:16
Mattmann, Chris A
2012-07-26, 15:00
Aaron T. Myers
2012-07-26, 17:20
Mattmann, Chris A
2012-07-26, 17:40
Robert Evans
2012-07-26, 14:28
Mattmann, Chris A
2012-07-26, 15:00
Suresh Srinivas
2012-07-26, 17:09
Arun C Murthy
2012-07-27, 03:20
Zizon Qiu
2012-07-27, 03:41
Harsh J
2012-07-27, 05:58
Steve Loughran
2012-07-27, 19:01
Tom White
2012-07-26, 14:23
Alejandro Abdelnur
2012-07-26, 15:10
Steve Loughran
2012-07-26, 17:02
Luke Lu
2012-07-26, 17:55
Steve Loughran
2012-07-26, 16:59
Jun Ping Du
2012-07-26, 23:03
Ahmed Radwan
2012-07-26, 20:32
Doug Cutting
2012-07-26, 21:17
Hitesh Shah
2012-07-26, 20:58
Finger, Jay
2012-07-26, 17:15
Thomas Graves
2012-07-26, 20:07
|
-
[DISCUSS] - YARN as a sub-project of Apache HadoopArun C Murthy 2012-07-26, 01:40
Folks,
It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. Thoughts? ---- What does it mean to the Hadoop developer community? # Project dependencies The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: - Common is the base - HDFS depends only on Common - YARN depends only on Common & HDFS - MapReduce depends on Common, HDFS & YARN. # Jira & Mailing lists We would have a separate YARN jira project and a yarn-dev@ mailing list. We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. # Subversion Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. Essentially the change would be: $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn ... and the necessary, albeit small, changes to our maven build infrastructure. # Release Cycles No changes. YARN would be co-released with Common, HDFS & MapReduce, as is the case today. thanks, Arun +
Arun C Murthy 2012-07-26, 01:40
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopEdward J. Yoon 2012-07-26, 02:09
> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.
> > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. +1 On Thu, Jul 26, 2012 at 10:40 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. > > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build infrastructure. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun -- Best Regards, Edward J. Yoon @eddieyoon +
Edward J. Yoon 2012-07-26, 02:09
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopMahadev Konar 2012-07-26, 04:44
+1 ....
mahadev On Wed, Jul 25, 2012 at 7:09 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: >> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. >> >> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > +1 > > On Thu, Jul 26, 2012 at 10:40 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> Folks, >> >> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. >> >> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. >> >> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. >> >> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. >> >> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. >> >> Thoughts? >> >> ---- >> >> What does it mean to the Hadoop developer community? >> >> # Project dependencies >> >> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: >> - Common is the base >> - HDFS depends only on Common >> - YARN depends only on Common & HDFS >> - MapReduce depends on Common, HDFS & YARN. >> >> # Jira & Mailing lists >> >> We would have a separate YARN jira project and a yarn-dev@ mailing list. >> >> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. >> >> # Subversion >> >> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. >> >> Essentially the change would be: >> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn >> ... and the necessary, albeit small, changes to our maven build infrastructure. >> >> # Release Cycles >> >> No changes. >> >> YARN would be co-released with Common, HDFS & MapReduce, as is the case today. >> >> thanks, >> Arun > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon +
Mahadev Konar 2012-07-26, 04:44
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopMattmann, Chris A 2012-07-26, 02:03
Hi Arun,
IMHO, it sounds like you guys might be better off proposing a new project for the Apache Incubator. Looking at the things you list below the ---, it looks like an Incubator proposal minus the initial committer list, and affiliations and mentors/champions ;) If you don't want to go to that level, I don't think you guys need anyone's permission, and/or etc., right? If YARN is a product of the Apache Hadoop PMC, you guys, as the PMC, can develop it and evolve it (it = the software and the community) how you guys see fit. Cheers, Chris On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. > > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build infrastructure. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2012-07-26, 02:03
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopArun C Murthy 2012-07-26, 02:11
Hi Chris,
On Jul 25, 2012, at 7:03 PM, Mattmann, Chris A (388J) wrote: > Hi Arun, > > IMHO, it sounds like you guys might be better off proposing a new project for the Apache Incubator. > Looking at the things you list below the ---, it looks like an Incubator proposal minus the initial committer > list, and affiliations and mentors/champions ;) > Fair point, thanks for chiming in Chris. However, I think we should revisit that when everything in Apache Hadoop (Common, HDFS, YARN & MapReduce) can fly out of the nest as separate projects. That, I think, is too early and also that keeping Common, HDFS, YARN & MapReduce together has value in ensuring that Hadoop continues to move along at a fair clip. > If you don't want to go to that level, I don't think you guys need anyone's permission, and/or etc., right? > If YARN is a product of the Apache Hadoop PMC, you guys, as the PMC, can develop it and evolve it > (it = the software and the community) how you guys see fit. > Agreed. Which is why I'm trying to gather consensus among the Hadoop community. thanks, Arun > Cheers, > Chris > > > On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote: > >> Folks, >> >> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. >> >> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. >> >> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. >> >> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. >> >> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. >> >> Thoughts? >> >> ---- >> >> What does it mean to the Hadoop developer community? >> >> # Project dependencies >> >> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: >> - Common is the base >> - HDFS depends only on Common >> - YARN depends only on Common & HDFS >> - MapReduce depends on Common, HDFS & YARN. >> >> # Jira & Mailing lists >> >> We would have a separate YARN jira project and a yarn-dev@ mailing list. >> >> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. >> >> # Subversion >> >> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. >> >> Essentially the change would be: >> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn >> ... and the necessary, albeit small, changes to our maven build infrastructure. >> >> # Release Cycles >> >> No changes. >> >> YARN would be co-released with Common, HDFS & MapReduce, as is the case today. >> >> thanks, >> Arun > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ +
Arun C Murthy 2012-07-26, 02:11
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopMattmann, Chris A 2012-07-26, 02:30
Hey Arun,
On Jul 25, 2012, at 7:11 PM, Arun C Murthy wrote: > Hi Chris, > > On Jul 25, 2012, at 7:03 PM, Mattmann, Chris A (388J) wrote: > >> Hi Arun, >> >> IMHO, it sounds like you guys might be better off proposing a new project for the Apache Incubator. >> Looking at the things you list below the ---, it looks like an Incubator proposal minus the initial committer >> list, and affiliations and mentors/champions ;) >> > > Fair point, thanks for chiming in Chris. However, I think we should revisit that when everything in Apache Hadoop (Common, HDFS, YARN & MapReduce) can fly out of the nest as separate projects. Yep the way I've seen them managed, IMHO, they should be separate projects. > That, I think, is too early and also that keeping Common, HDFS, YARN & MapReduce together has value in ensuring that Hadoop continues to move along at a fair clip. I realize I'm asking a hard question here: why *aren't* they separate projects? What's the barrier? They seem to be operating that way (and have been for a while). And I don't see how Hadoop still couldnt' move along at a fair clip with them as official TLPs themselves. > >> If you don't want to go to that level, I don't think you guys need anyone's permission, and/or etc., right? >> If YARN is a product of the Apache Hadoop PMC, you guys, as the PMC, can develop it and evolve it >> (it = the software and the community) how you guys see fit. >> > > Agreed. Which is why I'm trying to gather consensus among the Hadoop community. Yeah I know you are doing great -- my point is, technically, what consensus is required -- you develop code at Apache as individuals -- code is committed -- as are patches, etc. The PMC is there to regulate that, but it sounds like code wise you are proposing an svn mv command -- do you need an email thread to discuss that? Why not just do it, and if someone has a problem, *then* discuss? Dunno, that's just my opinion. The things that you are proposing that are new (e.g., mailing lists) will serve to splinter (at least the discussion in) the community IMHO -- this is spoken from experience in 2 situations (Nutch, Lucene) where we had an umbrella projects with tons of virtual "sub projects" that in the end have thrived as their own individual projects. if you are going to go that far, why not create a new Incubator project and just do it clean from the start? Cheers, Chris > > >> Cheers, >> Chris >> >> >> On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote: >> >>> Folks, >>> >>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. >>> >>> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. >>> >>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. >>> >>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. >>> >>> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. >>> >>> Thoughts? >>> >>> ---- >>> >>> What does it mean to the Hadoop developer community? >>> >>> # Project dependencies ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2012-07-26, 02:30
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopAaron T. Myers 2012-07-26, 06:16
On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <
[EMAIL PROTECTED]> wrote: > I realize I'm asking a hard question here: why *aren't* they separate > projects? What's the barrier? They seem > to be operating that way (and have been for a while). And I don't see how > Hadoop still couldnt' move along at > a fair clip with them as official TLPs themselves. > I'm opposed to this if for no other reason than that it makes it difficult to make logically-individual changes which span the projects. As much as we might like it to be the case, it is not presently true that Common is so independent and stable from HDFS and MR/YARN that Common could reasonably be separate and have its own release schedule. I think this view is supported by the fact that we once had separate SVN repos for Common, HDFS, and MR, but we undid that because having to make coordinated commits across the several repos, and the complex build dependencies it induced, was too onerous. The main reason I'm opposed to making them separate projects is that I don't think their internal interfaces are so stable that they could reasonably release independently. Though we've been pretty good at maintaining the stability of the external interfaces, we routinely make changes in the internal interfaces of Common/HDFS/MR that make the projects fairly tightly-coupled. Note that Arun's proposal specifically calls out that the sub-projects would still release together, which I support. Yeah I know you are doing great -- my point is, technically, what consensus > is required -- you develop code at Apache > as individuals -- code is committed -- as are patches, etc. The PMC is > there to regulate that, but it sounds like code wise > you are proposing an svn mv command -- do you need an email thread to > discuss that? Why not just do it, and if someone > has a problem, *then* discuss? Dunno, that's just my opinion. > I for one really appreciate Arun having this discussion beforehand. Making a change like this, even if it ends up being uncontroversial, will at least be quite disruptive to the developers working on Hadoop daily. I think it's great that Arun sought out feedback first to make sure folks agree that it's a worthwhile change to make. > > The things that you are proposing that are new (e.g., mailing lists) will > serve to splinter (at least the discussion in) the community IMHO -- > this is spoken from experience in 2 situations (Nutch, Lucene) where we > had an umbrella projects with tons of virtual "sub projects" that > in the end have thrived as their own individual projects. if you are going > to go that far, why not create a new Incubator project and just do > it clean from the start? > We recently discussed (and approved) merging all of the Hadoop *-user@mailing lists, so as to not splinter the user community, and make the project more approachable for users. In my experience, I've seen most developers (myself included) subscribe to all of the *-dev@ mailing lists. Even though I personally subscribe to all of them, I still prefer to have them separate, so that I can easily set up email filters/labels. -- Aaron T. Myers Software Engineer, Cloudera +
Aaron T. Myers 2012-07-26, 06:16
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopMattmann, Chris A 2012-07-26, 15:00
Hey Aaron,
On Jul 25, 2012, at 11:16 PM, Aaron T. Myers wrote: > On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) < > [EMAIL PROTECTED]> wrote: > >> I realize I'm asking a hard question here: why *aren't* they separate >> projects? What's the barrier? They seem >> to be operating that way (and have been for a while). And I don't see how >> Hadoop still couldnt' move along at >> a fair clip with them as official TLPs themselves. >> > > I'm opposed to this if for no other reason than that it makes it difficult > to make logically-individual changes which span the projects. As much as we > might like it to be the case, it is not presently true that Common is so > independent and stable from HDFS and MR/YARN that Common could reasonably > be separate and have its own release schedule. I think this view is > supported by the fact that we once had separate SVN repos for Common, HDFS, > and MR, but we undid that because having to make coordinated commits across > the several repos, and the complex build dependencies it induced, was too > onerous. Fair enough. > > The main reason I'm opposed to making them separate projects is that I > don't think their internal interfaces are so stable that they could > reasonably release independently. > Though we've been pretty good at > maintaining the stability of the external interfaces, we routinely make > changes in the internal interfaces of Common/HDFS/MR that make the projects > fairly tightly-coupled. Note that Arun's proposal specifically calls out > that the sub-projects would still release together, which I support. Sub projects are not a good thing at Apache. Well, "official" sub projects that have their own committees, mailing lists, etc. You guys aren't talking about sub projects (though you call them that) -- in reality you are talking about *products* that the Apache Hadoop PMC releases. They may have different names, be on different release schedules, have different mailing lists even (which I still is not the right thing to do), but they are not *projects*. I guess that's one thing that got me confused with Arun's original proposal: in it there is talk of different sub-*projects* and making YARN a new sub-*project* and discussion of it and Map Reduce and each attracting a diverse (implied: different) community. If you guys are talking about *products* that themselves have different *communities* then pretty much at Apache those are different *projects*. If you are talking about different *products* that themselves have *the same community* who releases those *products* then we are talking about a single *project* at Apache that has different *products* that it releases (am I confusing you yet?) :) Regardless, I guess in the end what I was questioning was that if you look at the net of Arun's proposal minus Project Dependencies (which is really code level things -- at Apache code is one thing, but we are dealing with *communities*), and Release Cycles (no changes), the proposal boils down to: 1. Creating separate mailing lists for YARN 2. an svn mv command My advice on #1 was be careful on splitting mailing lists, I've seen that cause trouble (even before Hadoop existed and in other Apache projects I've cited), and then on #2, why not execute the svn mv command and just move forward? You all are on the Hadoop PMC and I assume trust Arun (and that he trusts you guys since you've given each other the commit bit), so move forward on it. As for #2, your point about being happy Arun brought this up as it would have impact on the build cycle/etc etc., that makes sense and is a good reason to DISCUSS it. > > Yeah I know you are doing great -- my point is, technically, what consensus >> is required -- you develop code at Apache >> as individuals -- code is committed -- as are patches, etc. The PMC is >> there to regulate that, but it sounds like code wise >> you are proposing an svn mv command -- do you need an email thread to >> discuss that? Why not just do it, and if someone Yep thanks. This is good validation for #2 above then. Yeah, that's cool. I do the same myself and that makes sense. It just seemed like a formal proposal to create a project, minus the creating project thing, so I thought I'd ask. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2012-07-26, 15:00
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopAaron T. Myers 2012-07-26, 17:20
Hi Chris,
On Thu, Jul 26, 2012 at 8:00 AM, Mattmann, Chris A (388J) < [EMAIL PROTECTED]> wrote: > Sub projects are not a good thing at Apache. Well, "official" sub projects > that have their own committees, mailing lists, etc. You guys aren't talking > about sub projects (though you call them that) -- in reality you are > talking > about *products* that the Apache Hadoop PMC releases. They may have > different names, be on different release schedules, have different mailing > lists even (which I still is not the right thing to do), but they are not > *projects*. <snip> > Yea, sounds like we have a bit of a terminology problem here. We've always called them "sub-projects", but in fact they're all managed by a single PMC, released as a single artifact, live in a single source repository, will soon have a single user mailing list, and have a largely overlapping set of committers. The things they do maintain separately are *-dev@ /*-issues@/*-commits@ mailing lists, and separate "JIRA projects." I think these separations are worth maintaining. Anyway, I think that having totally separate TLPs may one day make sense, but I think it would be premature to do so now. Thanks for the discussion, Chris. -- Aaron T. Myers Software Engineer, Cloudera +
Aaron T. Myers 2012-07-26, 17:20
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopMattmann, Chris A 2012-07-26, 17:40
Thanks Aaron, makes total sense.
Take care! Cheers, Chris On Jul 26, 2012, at 10:20 AM, Aaron T. Myers wrote: > Hi Chris, > > On Thu, Jul 26, 2012 at 8:00 AM, Mattmann, Chris A (388J) < > [EMAIL PROTECTED]> wrote: > >> Sub projects are not a good thing at Apache. Well, "official" sub projects >> that have their own committees, mailing lists, etc. You guys aren't talking >> about sub projects (though you call them that) -- in reality you are >> talking >> about *products* that the Apache Hadoop PMC releases. They may have >> different names, be on different release schedules, have different mailing >> lists even (which I still is not the right thing to do), but they are not >> *projects*. <snip> >> > > Yea, sounds like we have a bit of a terminology problem here. We've always > called them "sub-projects", but in fact they're all managed by a single > PMC, released as a single artifact, live in a single source repository, > will soon have a single user mailing list, and have a largely overlapping > set of committers. The things they do maintain separately are *-dev@ > /*-issues@/*-commits@ mailing lists, and separate "JIRA projects." I think > these separations are worth maintaining. > > Anyway, I think that having totally separate TLPs may one day make sense, > but I think it would be premature to do so now. > > Thanks for the discussion, Chris. > > -- > Aaron T. Myers > Software Engineer, Cloudera ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2012-07-26, 17:40
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopRobert Evans 2012-07-26, 14:28
+1 for what Aaron said. The projects are not ready to split yet.
MAPREDUCE-3300 for example. YARN cannot display a UI for aggregated container logs unless we also have the MR History Server up and running. If we do want to split all of the projects HDFS, COMMON, YARN, and MAPREDUCE it will take some feature and design work to get the APIs to a point that there are no more @LimitedPrivate APIs. I personally would like to see this happen eventually, but it is not something on my priority list. --Bobby Evans On 7/26/12 1:16 AM, "Aaron T. Myers" <[EMAIL PROTECTED]> wrote: >On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) < >[EMAIL PROTECTED]> wrote: > >> I realize I'm asking a hard question here: why *aren't* they separate >> projects? What's the barrier? They seem >> to be operating that way (and have been for a while). And I don't see >>how >> Hadoop still couldnt' move along at >> a fair clip with them as official TLPs themselves. >> > >I'm opposed to this if for no other reason than that it makes it difficult >to make logically-individual changes which span the projects. As much as >we >might like it to be the case, it is not presently true that Common is so >independent and stable from HDFS and MR/YARN that Common could reasonably >be separate and have its own release schedule. I think this view is >supported by the fact that we once had separate SVN repos for Common, >HDFS, >and MR, but we undid that because having to make coordinated commits >across >the several repos, and the complex build dependencies it induced, was too >onerous. > >The main reason I'm opposed to making them separate projects is that I >don't think their internal interfaces are so stable that they could >reasonably release independently. Though we've been pretty good at >maintaining the stability of the external interfaces, we routinely make >changes in the internal interfaces of Common/HDFS/MR that make the >projects >fairly tightly-coupled. Note that Arun's proposal specifically calls out >that the sub-projects would still release together, which I support. > >Yeah I know you are doing great -- my point is, technically, what >consensus >> is required -- you develop code at Apache >> as individuals -- code is committed -- as are patches, etc. The PMC is >> there to regulate that, but it sounds like code wise >> you are proposing an svn mv command -- do you need an email thread to >> discuss that? Why not just do it, and if someone >> has a problem, *then* discuss? Dunno, that's just my opinion. >> > >I for one really appreciate Arun having this discussion beforehand. Making >a change like this, even if it ends up being uncontroversial, will at >least >be quite disruptive to the developers working on Hadoop daily. I think >it's >great that Arun sought out feedback first to make sure folks agree that >it's a worthwhile change to make. > > >> >> The things that you are proposing that are new (e.g., mailing lists) >>will >> serve to splinter (at least the discussion in) the community IMHO -- >> this is spoken from experience in 2 situations (Nutch, Lucene) where we >> had an umbrella projects with tons of virtual "sub projects" that >> in the end have thrived as their own individual projects. if you are >>going >> to go that far, why not create a new Incubator project and just do >> it clean from the start? >> > >We recently discussed (and approved) merging all of the Hadoop >*-user@mailing lists, so as to not splinter the user community, and >make the >project more approachable for users. In my experience, I've seen most >developers (myself included) subscribe to all of the *-dev@ mailing lists. >Even though I personally subscribe to all of them, I still prefer to have >them separate, so that I can easily set up email filters/labels. > >-- >Aaron T. Myers >Software Engineer, Cloudera +
Robert Evans 2012-07-26, 14:28
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopMattmann, Chris A 2012-07-26, 15:00
Thanks for your comments Bobby, makes sense.
Cheers, Chris On Jul 26, 2012, at 7:28 AM, Robert Evans wrote: > +1 for what Aaron said. The projects are not ready to split yet. > MAPREDUCE-3300 for example. YARN cannot display a UI for aggregated > container logs unless we also have the MR History Server up and running. > If we do want to split all of the projects HDFS, COMMON, YARN, and > MAPREDUCE it will take some feature and design work to get the APIs to a > point that there are no more @LimitedPrivate APIs. I personally would > like to see this happen eventually, but it is not something on my priority > list. > > > --Bobby Evans > > On 7/26/12 1:16 AM, "Aaron T. Myers" <[EMAIL PROTECTED]> wrote: > >> On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) < >> [EMAIL PROTECTED]> wrote: >> >>> I realize I'm asking a hard question here: why *aren't* they separate >>> projects? What's the barrier? They seem >>> to be operating that way (and have been for a while). And I don't see >>> how >>> Hadoop still couldnt' move along at >>> a fair clip with them as official TLPs themselves. >>> >> >> I'm opposed to this if for no other reason than that it makes it difficult >> to make logically-individual changes which span the projects. As much as >> we >> might like it to be the case, it is not presently true that Common is so >> independent and stable from HDFS and MR/YARN that Common could reasonably >> be separate and have its own release schedule. I think this view is >> supported by the fact that we once had separate SVN repos for Common, >> HDFS, >> and MR, but we undid that because having to make coordinated commits >> across >> the several repos, and the complex build dependencies it induced, was too >> onerous. >> >> The main reason I'm opposed to making them separate projects is that I >> don't think their internal interfaces are so stable that they could >> reasonably release independently. Though we've been pretty good at >> maintaining the stability of the external interfaces, we routinely make >> changes in the internal interfaces of Common/HDFS/MR that make the >> projects >> fairly tightly-coupled. Note that Arun's proposal specifically calls out >> that the sub-projects would still release together, which I support. >> >> Yeah I know you are doing great -- my point is, technically, what >> consensus >>> is required -- you develop code at Apache >>> as individuals -- code is committed -- as are patches, etc. The PMC is >>> there to regulate that, but it sounds like code wise >>> you are proposing an svn mv command -- do you need an email thread to >>> discuss that? Why not just do it, and if someone >>> has a problem, *then* discuss? Dunno, that's just my opinion. >>> >> >> I for one really appreciate Arun having this discussion beforehand. Making >> a change like this, even if it ends up being uncontroversial, will at >> least >> be quite disruptive to the developers working on Hadoop daily. I think >> it's >> great that Arun sought out feedback first to make sure folks agree that >> it's a worthwhile change to make. >> >> >>> >>> The things that you are proposing that are new (e.g., mailing lists) >>> will >>> serve to splinter (at least the discussion in) the community IMHO -- >>> this is spoken from experience in 2 situations (Nutch, Lucene) where we >>> had an umbrella projects with tons of virtual "sub projects" that >>> in the end have thrived as their own individual projects. if you are >>> going >>> to go that far, why not create a new Incubator project and just do >>> it clean from the start? >>> >> >> We recently discussed (and approved) merging all of the Hadoop >> *-user@mailing lists, so as to not splinter the user community, and >> make the >> project more approachable for users. In my experience, I've seen most >> developers (myself included) subscribe to all of the *-dev@ mailing lists. >> Even though I personally subscribe to all of them, I still prefer to have >> them separate, so that I can easily set up email filters/labels. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2012-07-26, 15:00
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopSuresh Srinivas 2012-07-26, 17:09
+1 from me.
The main question is, is this a good idea without considering the details of how easy/hard it is to do? I think it is a good idea and we should move in this direction. If we all agree on this, lets discuss main issues that need to be resolved to split YARN into a separate project. As others have suggested, we should ensure this is done smoothly and does not disrupt the project and does not make day to day work for contributors very hard. On Thu, Jul 26, 2012 at 7:28 AM, Robert Evans <[EMAIL PROTECTED]> wrote: > +1 for what Aaron said. The projects are not ready to split yet. > MAPREDUCE-3300 for example. YARN cannot display a UI for aggregated > container logs unless we also have the MR History Server up and running. > If we do want to split all of the projects HDFS, COMMON, YARN, and > MAPREDUCE it will take some feature and design work to get the APIs to a > point that there are no more @LimitedPrivate APIs. I personally would > like to see this happen eventually, but it is not something on my priority > list. > > > --Bobby Evans > > On 7/26/12 1:16 AM, "Aaron T. Myers" <[EMAIL PROTECTED]> wrote: > > >On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) < > >[EMAIL PROTECTED]> wrote: > > > >> I realize I'm asking a hard question here: why *aren't* they separate > >> projects? What's the barrier? They seem > >> to be operating that way (and have been for a while). And I don't see > >>how > >> Hadoop still couldnt' move along at > >> a fair clip with them as official TLPs themselves. > >> > > > >I'm opposed to this if for no other reason than that it makes it difficult > >to make logically-individual changes which span the projects. As much as > >we > >might like it to be the case, it is not presently true that Common is so > >independent and stable from HDFS and MR/YARN that Common could reasonably > >be separate and have its own release schedule. I think this view is > >supported by the fact that we once had separate SVN repos for Common, > >HDFS, > >and MR, but we undid that because having to make coordinated commits > >across > >the several repos, and the complex build dependencies it induced, was too > >onerous. > > > >The main reason I'm opposed to making them separate projects is that I > >don't think their internal interfaces are so stable that they could > >reasonably release independently. Though we've been pretty good at > >maintaining the stability of the external interfaces, we routinely make > >changes in the internal interfaces of Common/HDFS/MR that make the > >projects > >fairly tightly-coupled. Note that Arun's proposal specifically calls out > >that the sub-projects would still release together, which I support. > > > >Yeah I know you are doing great -- my point is, technically, what > >consensus > >> is required -- you develop code at Apache > >> as individuals -- code is committed -- as are patches, etc. The PMC is > >> there to regulate that, but it sounds like code wise > >> you are proposing an svn mv command -- do you need an email thread to > >> discuss that? Why not just do it, and if someone > >> has a problem, *then* discuss? Dunno, that's just my opinion. > >> > > > >I for one really appreciate Arun having this discussion beforehand. Making > >a change like this, even if it ends up being uncontroversial, will at > >least > >be quite disruptive to the developers working on Hadoop daily. I think > >it's > >great that Arun sought out feedback first to make sure folks agree that > >it's a worthwhile change to make. > > > > > >> > >> The things that you are proposing that are new (e.g., mailing lists) > >>will > >> serve to splinter (at least the discussion in) the community IMHO -- > >> this is spoken from experience in 2 situations (Nutch, Lucene) where we > >> had an umbrella projects with tons of virtual "sub projects" that > >> in the end have thrived as their own individual projects. if you are > >>going > >> to go that far, why not create a new Incubator project and just do http://hortonworks.com/download/ +
Suresh Srinivas 2012-07-26, 17:09
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopArun C Murthy 2012-07-27, 03:20
Looks like the feedback has been very positive, I'll start a vote to formalize it.
thanks, Arun On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. > > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build infrastructure. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ +
Arun C Murthy 2012-07-27, 03:20
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopZizon Qiu 2012-07-27, 03:41
why not naming MAPREDUCE to YARN ,as in hadoop 2.0 MR2 is a implementation
of YARN? On Fri, Jul 27, 2012 at 11:20 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Looks like the feedback has been very positive, I'll start a vote to > formalize it. > > thanks, > Arun > > On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote: > > > Folks, > > > > It's been nearly a year since we merged Hadoop YARN into trunk and we > have made several releases since. > > > > It's exciting to see various open-source communities (both in the ASF > and externally) start to explore integration with YARN such as Apache Hama, > Apache Giraph, Apache S4, Spark etc. This promises to help us realize our > hopes of making Apache Hadoop a much more general data processing platform > (& storage, of course) and not tied to MapReduce alone for processing data. > Furthermore, we already have people contributing interesting prototypes > such as DistributedShell and PaaS on YARN. > > > > Given this, I think it would be useful to make YARN a sub-project of > Apache Hadoop along with Common, HDFS & MapReduce. I believe this would > help other communities realize that they could consider using YARN as a > general-purpose resource management layer and help us enhance YARN beyond > it's humble beginnings. > > > > Clearly, YARN and MapReduce are different enough that they can and will > attract a diverse community. > > > > I'd like to clarify that this proposal *does not* mean we move the code > base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside > hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there > would be *no changes* to release cycles - YARN would be co-released with > Common, HDFS & MapReduce. > > > > Thoughts? > > > > ---- > > > > What does it mean to the Hadoop developer community? > > > > # Project dependencies > > > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, > YARN & MapReduce. As today, the dependencies *do not change*: > > - Common is the base > > - HDFS depends only on Common > > - YARN depends only on Common & HDFS > > - MapReduce depends on Common, HDFS & YARN. > > > > # Jira & Mailing lists > > > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > > > We already use separate MAPREDUCE jira issues for making changes to YARN > (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce > ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a > change. > > > > # Subversion > > > > Not much at all! YARN has, since the beginning, been developed with the > understanding that it is very independent of MapReduce and the code-bases > are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and > hadoop-mapreduce-project/hadoop-mapreduce-client. > > > > Essentially the change would be: > > $ svn mv hadoop-mapreduce-project/hadoop-yarn > hadoop-yarn-project/hadoop-yarn > > ... and the necessary, albeit small, changes to our maven build > infrastructure. > > > > # Release Cycles > > > > No changes. > > > > YARN would be co-released with Common, HDFS & MapReduce, as is the case > today. > > > > thanks, > > Arun > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > > +
Zizon Qiu 2012-07-27, 03:41
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopHarsh J 2012-07-27, 05:58
Hi Zizon,
MR is still MR, while YARN is a resource scheduler (generic, agnostic of 'MR'). MR1 ran over JobTracker and TaskTrackers, while MR2 runs from an AM and runs tasks via YARN. It would not make sense to rename MR to YARN as these are separate things, and calling YARN as MR2 only adds to the confusion. On Fri, Jul 27, 2012 at 9:11 AM, Zizon Qiu <[EMAIL PROTECTED]> wrote: > why not naming MAPREDUCE to YARN ,as in hadoop 2.0 MR2 is a implementation > of YARN? > > On Fri, Jul 27, 2012 at 11:20 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > >> Looks like the feedback has been very positive, I'll start a vote to >> formalize it. >> >> thanks, >> Arun >> >> On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote: >> >> > Folks, >> > >> > It's been nearly a year since we merged Hadoop YARN into trunk and we >> have made several releases since. >> > >> > It's exciting to see various open-source communities (both in the ASF >> and externally) start to explore integration with YARN such as Apache Hama, >> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our >> hopes of making Apache Hadoop a much more general data processing platform >> (& storage, of course) and not tied to MapReduce alone for processing data. >> Furthermore, we already have people contributing interesting prototypes >> such as DistributedShell and PaaS on YARN. >> > >> > Given this, I think it would be useful to make YARN a sub-project of >> Apache Hadoop along with Common, HDFS & MapReduce. I believe this would >> help other communities realize that they could consider using YARN as a >> general-purpose resource management layer and help us enhance YARN beyond >> it's humble beginnings. >> > >> > Clearly, YARN and MapReduce are different enough that they can and will >> attract a diverse community. >> > >> > I'd like to clarify that this proposal *does not* mean we move the code >> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside >> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there >> would be *no changes* to release cycles - YARN would be co-released with >> Common, HDFS & MapReduce. >> > >> > Thoughts? >> > >> > ---- >> > >> > What does it mean to the Hadoop developer community? >> > >> > # Project dependencies >> > >> > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, >> YARN & MapReduce. As today, the dependencies *do not change*: >> > - Common is the base >> > - HDFS depends only on Common >> > - YARN depends only on Common & HDFS >> > - MapReduce depends on Common, HDFS & YARN. >> > >> > # Jira & Mailing lists >> > >> > We would have a separate YARN jira project and a yarn-dev@ mailing list. >> > >> > We already use separate MAPREDUCE jira issues for making changes to YARN >> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce >> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a >> change. >> > >> > # Subversion >> > >> > Not much at all! YARN has, since the beginning, been developed with the >> understanding that it is very independent of MapReduce and the code-bases >> are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and >> hadoop-mapreduce-project/hadoop-mapreduce-client. >> > >> > Essentially the change would be: >> > $ svn mv hadoop-mapreduce-project/hadoop-yarn >> hadoop-yarn-project/hadoop-yarn >> > ... and the necessary, albeit small, changes to our maven build >> infrastructure. >> > >> > # Release Cycles >> > >> > No changes. >> > >> > YARN would be co-released with Common, HDFS & MapReduce, as is the case >> today. >> > >> > thanks, >> > Arun >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> >> -- Harsh J +
Harsh J 2012-07-27, 05:58
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopSteve Loughran 2012-07-27, 19:01
one more thing
I think the service lifecycle stuff (inner start/stop methods) are actually a layer below Yarn and could go into common, though there are some things I'd like to fix there first (state machine doesn't let you stop without starting, implementations state checks happen after subclasses exec start/stop transitions &c. There is no reason why other services such as the NN and DN can't adopt the same lifecycle, and it would unify some management operations to have a consistent state view of all hadoop services. +
Steve Loughran 2012-07-27, 19:01
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopTom White 2012-07-26, 14:23
On Wed, Jul 25, 2012 at 9:40 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. > > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. > > Thoughts? +1 to the direction. > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. To be clear, these are runtime dependencies - YARN and MapReduce should not have any compile-time dependencies on HDFS. See MAPREDUCE-4147 and MAPREDUCE-4148. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build infrastructure. It would be good to eliminate the resulting redundant level in the hierarchy at the same time: i.e. hadoop-mapreduce-project/hadoop-mapreduce-client -> hadoop-mapreduce-project. Cheers, Tom > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun +
Tom White 2012-07-26, 14:23
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopAlejandro Abdelnur 2012-07-26, 15:10
+1 on moving hadoop-yarn to trunk/ level. As part of that, can we flatten
the internal hierarchy so there are not multiple nested modules within hadoop-yarn module? just one level as in common, hdfs & tools? this will make the build more consistent and will allow to consolidate logic in the POMs. This flattening would also apply to MR modules. Also does this means we'll be creating a new JIRA project 'YARN'? My problem with the current multi projects approach is that you cannot do umbrella JIRAs with subtasks spanning across different projects, all subtasks must be in the same project. Does anybody know if there is a config in JIRA to enable cross-project subtasks within a set of projects? Thx. On Thu, Jul 26, 2012 at 7:23 AM, Tom White <[EMAIL PROTECTED]> wrote: > On Wed, Jul 25, 2012 at 9:40 PM, Arun C Murthy <[EMAIL PROTECTED]> > wrote: > > Folks, > > > > It's been nearly a year since we merged Hadoop YARN into trunk and we > have made several releases since. > > > > It's exciting to see various open-source communities (both in the ASF > and externally) start to explore integration with YARN such as Apache Hama, > Apache Giraph, Apache S4, Spark etc. This promises to help us realize our > hopes of making Apache Hadoop a much more general data processing platform > (& storage, of course) and not tied to MapReduce alone for processing data. > Furthermore, we already have people contributing interesting prototypes > such as DistributedShell and PaaS on YARN. > > > > Given this, I think it would be useful to make YARN a sub-project of > Apache Hadoop along with Common, HDFS & MapReduce. I believe this would > help other communities realize that they could consider using YARN as a > general-purpose resource management layer and help us enhance YARN beyond > it's humble beginnings. > > > > Clearly, YARN and MapReduce are different enough that they can and will > attract a diverse community. > > > > I'd like to clarify that this proposal *does not* mean we move the code > base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside > hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there > would be *no changes* to release cycles - YARN would be co-released with > Common, HDFS & MapReduce. > > > > Thoughts? > > +1 to the direction. > > > > > ---- > > > > What does it mean to the Hadoop developer community? > > > > # Project dependencies > > > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, > YARN & MapReduce. As today, the dependencies *do not change*: > > - Common is the base > > - HDFS depends only on Common > > - YARN depends only on Common & HDFS > > - MapReduce depends on Common, HDFS & YARN. > > To be clear, these are runtime dependencies - YARN and MapReduce > should not have any compile-time dependencies on HDFS. See > MAPREDUCE-4147 and MAPREDUCE-4148. > > > > > # Jira & Mailing lists > > > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > > > We already use separate MAPREDUCE jira issues for making changes to YARN > (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce > ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a > change. > > > > # Subversion > > > > Not much at all! YARN has, since the beginning, been developed with the > understanding that it is very independent of MapReduce and the code-bases > are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and > hadoop-mapreduce-project/hadoop-mapreduce-client. > > > > Essentially the change would be: > > $ svn mv hadoop-mapreduce-project/hadoop-yarn > hadoop-yarn-project/hadoop-yarn > > ... and the necessary, albeit small, changes to our maven build > infrastructure. > > It would be good to eliminate the resulting redundant level in the > hierarchy at the same time: i.e. > hadoop-mapreduce-project/hadoop-mapreduce-client -> > hadoop-mapreduce-project. > > Cheers, > Tom > > > > > # Release Cycles > > > > No changes. > > > > YARN would be co-released with Common, HDFS & MapReduce, as is the case Alejandro +
Alejandro Abdelnur 2012-07-26, 15:10
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopSteve Loughran 2012-07-26, 17:02
On 26 July 2012 08:10, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:
> As part of that, can we flatten > the internal hierarchy so there are not multiple nested modules within > hadoop-yarn module? just one level as in common, hdfs & tools? this will > make the build more consistent and will allow to consolidate logic in the > POMs. This flattening would also apply to MR modules. > You need to start a a project using Gradle as its build tool. Your life will be better, and you can stop worrying about how Maven handles things. Otherwise, +1 to doing something about the POMs, though that's very much an artifact of Maven's world view. Bigtop is similarly complex. +
Steve Loughran 2012-07-26, 17:02
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopLuke Lu 2012-07-26, 17:55
+1. Probably should've done so when we mavenized the whole thing :)
On Wed, Jul 25, 2012 at 6:40 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. > > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build infrastructure. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun +
Luke Lu 2012-07-26, 17:55
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopSteve Loughran 2012-07-26, 16:59
On 25 July 2012 18:40, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have > made several releases since. > > It's exciting to see various open-source communities (both in the ASF and > externally) start to explore integration with YARN such as Apache Hama, > Apache Giraph, Apache S4, Spark etc. This promises to help us realize our > hopes of making Apache Hadoop a much more general data processing platform > (& storage, of course) and not tied to MapReduce alone for processing data. > Furthermore, we already have people contributing interesting prototypes > such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of > Apache Hadoop along with Common, HDFS & MapReduce. I believe this would > help other communities realize that they could consider using YARN as a > general-purpose resource management layer and help us enhance YARN beyond > it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will > attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code > base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside > hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there > would be *no changes* to release cycles - YARN would be co-released with > Common, HDFS & MapReduce. > > If the goal is to clearly partition the scheduling layer from the app layer, and you think it helps isolate changes, then yes +1 Forcing that strict hierarchy does ensure that you really do have a clean separation of modules, and emphasises that it is more than just MapRed -as people add more applications I can see that the separation would get their needs addressed. Having a separate project could also allow Yarn to do a point release in sync with those other projects, as well as do co-ordinated releases with Hadoop itself. It should also make clear that Yarn is designed to be a topology-aware underpinning of a datacentre, interesting in its own right. Which reminds me, I'd better get my topology stuff in. -Steve +
Steve Loughran 2012-07-26, 16:59
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopJun Ping Du 2012-07-26, 23:03
+1. It definitely should be some work to do for separating YARN, but it deserve.
Thanks, Junping ----- Original Message ----- From: "Arun C Murthy" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, July 26, 2012 9:40:21 AM Subject: [DISCUSS] - YARN as a sub-project of Apache Hadoop Folks, It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. Thoughts? ---- What does it mean to the Hadoop developer community? # Project dependencies The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: - Common is the base - HDFS depends only on Common - YARN depends only on Common & HDFS - MapReduce depends on Common, HDFS & YARN. # Jira & Mailing lists We would have a separate YARN jira project and a yarn-dev@ mailing list. We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. # Subversion Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. Essentially the change would be: $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn ... and the necessary, albeit small, changes to our maven build infrastructure. # Release Cycles No changes. YARN would be co-released with Common, HDFS & MapReduce, as is the case today. thanks, Arun +
Jun Ping Du 2012-07-26, 23:03
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopAhmed Radwan 2012-07-26, 20:32
Thanks Arun! +1, this organization makes sense. Also, what will be the
strategy for applications other than MapReduce going forward. Will they be part of YARN or separate sub-projects like MapReduce? They now live inside hadoop-yarn-applications. I think they can remain there, and when getting mature enough, they can either become separate sub-projects, or even TLPs based on how large and independent they are. Thoughts? Best Regards Ahmed On Wed, Jul 25, 2012 at 6:40 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. > > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build infrastructure. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun -- Ahmed +
Ahmed Radwan 2012-07-26, 20:32
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopDoug Cutting 2012-07-26, 21:17
+1 This would be an improved layering of components.
As others have noted we should probably stop using the term "subproject" for these, as that's most often used at Apache for things that are released independently. Better terms might be "components" or "modules". Addressing that might also require restructuring the website. Doug On Wed, Jul 25, 2012 at 6:40 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. > > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build infrastructure. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun +
Doug Cutting 2012-07-26, 21:17
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopHitesh Shah 2012-07-26, 20:58
+1.
-- Hitesh On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since. > > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build infrastructure. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun +
Hitesh Shah 2012-07-26, 20:58
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopFinger, Jay 2012-07-26, 17:15
I'm not sure what the goal of that is. If this is an Apache
organizational/political thing then I am oblivious. If the point is that YARN should not be a subproject of MapReduce, then I agree completely. Any argument by which YARN is a subproject of MR could also be made that YARN should be a subproject of MPI, Spark, etc. And obviously it cannot be a subproject of all of them. To that end, YARN should be a peer of core and hdfs. I prefer that MR remain a peer of those as well, but since the current approach seems to prefer over factoring things with painfully deep hierarchies, then the consistent thing to do would be to make MR a subproject of YARN (blech). I prefer simple flat trees, though. jay On 7/25/12 6:40 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: >Folks, > >It's been nearly a year since we merged Hadoop YARN into trunk and we >have made several releases since. > >It's exciting to see various open-source communities (both in the ASF and >externally) start to explore integration with YARN such as Apache Hama, >Apache Giraph, Apache S4, Spark etc. This promises to help us realize our >hopes of making Apache Hadoop a much more general data processing >platform (& storage, of course) and not tied to MapReduce alone for >processing data. Furthermore, we already have people contributing >interesting prototypes such as DistributedShell and PaaS on YARN. > >Given this, I think it would be useful to make YARN a sub-project of >Apache Hadoop along with Common, HDFS & MapReduce. I believe this would >help other communities realize that they could consider using YARN as a >general-purpose resource management layer and help us enhance YARN beyond >it's humble beginnings. > >Clearly, YARN and MapReduce are different enough that they can and will >attract a diverse community. > >I'd like to clarify that this proposal *does not* mean we move the code >base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside >hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, >there would be *no changes* to release cycles - YARN would be co-released >with Common, HDFS & MapReduce. > >Thoughts? > >---- > >What does it mean to the Hadoop developer community? > ># Project dependencies > >The change is that Hadoop would now have 4 sub-projects: Common, HDFS, >YARN & MapReduce. As today, the dependencies *do not change*: >- Common is the base >- HDFS depends only on Common >- YARN depends only on Common & HDFS >- MapReduce depends on Common, HDFS & YARN. > ># Jira & Mailing lists > >We would have a separate YARN jira project and a yarn-dev@ mailing list. > >We already use separate MAPREDUCE jira issues for making changes to YARN >(ResourceManager, NodeManager) and to the MapReduce framework (MapReduce >ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a >change. > ># Subversion > >Not much at all! YARN has, since the beginning, been developed with the >understanding that it is very independent of MapReduce and the code-bases >are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and >hadoop-mapreduce-project/hadoop-mapreduce-client. > >Essentially the change would be: >$ svn mv hadoop-mapreduce-project/hadoop-yarn >hadoop-yarn-project/hadoop-yarn >... and the necessary, albeit small, changes to our maven build >infrastructure. > ># Release Cycles > >No changes. > >YARN would be co-released with Common, HDFS & MapReduce, as is the case >today. > >thanks, >Arun +
Finger, Jay 2012-07-26, 17:15
-
Re: [DISCUSS] - YARN as a sub-project of Apache HadoopThomas Graves 2012-07-26, 20:07
+1 for the idea. I think separating the framework from the MR application
makes sense. Tom On 7/25/12 8:40 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we have > made several releases since. > > It's exciting to see various open-source communities (both in the ASF and > externally) start to explore integration with YARN such as Apache Hama, Apache > Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of > making Apache Hadoop a much more general data processing platform (& storage, > of course) and not tied to MapReduce alone for processing data. Furthermore, > we already have people contributing interesting prototypes such as > DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apache > Hadoop along with Common, HDFS & MapReduce. I believe this would help other > communities realize that they could consider using YARN as a general-purpose > resource management layer and help us enhance YARN beyond it's humble > beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will > attract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code base > out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside > hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there > would be *no changes* to release cycles - YARN would be co-released with > Common, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & > MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN > (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce > ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a > change. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the > understanding that it is very independent of MapReduce and the code-bases are > already independent i.e. hadoop-mapreduce-project/hadoop-yarn and > hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn > ... and the necessary, albeit small, changes to our maven build > infrastructure. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case today. > > thanks, > Arun +
Thomas Graves 2012-07-26, 20:07
|