Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Copy link to this message
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
I personally am for splitting up the projects.  I think there is a lot of
potential that each of the projects could have on their own, and I expect
to see them evolve in new and interesting ways when the projects are not
tied directly together.

But, in order to get there we need to address the issues that made the
first split attempt fail.  First off we need look at all API calls that
MR, YARN, or HDFS do into common that are not @Stable, and either promote
them to @Stable or remove the need for those calls.  Second while we are
doing that we need to look at the visibility of those APIs.  How many APIs
really need to be @LimitedPrivate or should they be @Public? How many of
the APIs have no designation at all?  Third get truly serious about
maintaining binary compatibility on @Stable APIs. Fourth we need to start
splitting the projects up, starting with common.  I think it would be cool
to call it liBig, but I digress.  Once common has been split out and is on
its own for a few releases, we start splitting out HDFS, YARN, and
MapReduce.  For each of those we need to do a similar audit between the
projects and fix the interdependencies between them.  This is mostly
dependencies between YARN and MR.

As part of this we also need to have a clear set of rules about what it
takes to become a committer or PMC member for the new projects when they
split off.  I am fine with all committers become PMC members, but if we
merge the lists now and simply say all pervious committers become
committers on the new TLPs there will be a lot of committers/PMC members
that have no real desire to be on those projects.  I would propose that we
merge the committer lists, but all committers on the current project
receive an invitation to become a committer on the new projects.  ATM
convinced me that committers know their boundaries and will self censor.
I believe that many committers will decline to become committers on the
new projects either because it is out of their area of experteese or
because they are not involved with Hadoop any more, and will ignore the

I fear that just voting and doing an svn copy -m will result in the same
thing that happened last time.  Someone will want to make a large change.
This will require making a change to something in common, but because it
cannot easily be done in a backwards compatible way, or it will take three
steps to complete the change instead of one we will get frustrated.  If
this happens enough we will really get frustrated and try to merge the
projects back together again.   This is because the projects are too
tightly coupled together right now to really have them stand on their own.
 Just look at all of the security and token work that has been done
recently.  They have touched every single project and it has been a bit of
a nightmare.  It would be even worse if the projects were completely split

I also want us to think about the timing of this.  Do we really want to do
this before 2.0 is GA?  Doing this properly is probably going to be a
several month effort for one or two people, and a concerted effort by
everyone not to break things while they work.  If we have to rearchitect
something so that the APIs can be marked stable it may be a lot longer
then that.  Is it worth pushing the GA of 2.0 off by an entire quarter?
For me I would say yes, but I know others have different opinions, and
different schedules.


I can see your desire to do the split now, and then deal with the fallout
as we adapt to the changes.  I think that would work assuming that we all
are completely committed to making the changes necessary. But because we
are having this discussion at all seems to indicate that we are not all
completely committed to this, and I also feel that dealing with the
fallout is going to take a lot longer if we don't try to address some of
the problems up front.  Putting on my Yahoo! Hat, I want to avoid as many
problems and delays as I can, because my customers want a stable release
of Hadoop the features that are in 2.0.  The longer it is delayed the
longer we stay on branch-0.23.  A one quarter delay because of this I am
sure I can swing, more then that and I will start to get more pressure to
pull in new features which will probably mean that we then have to fork
which is something that I really do not want to do.

So I am +1 on merging the committer list, and +1 splitting the projects.
I would encourage us to at least do some planning and legwork up front
before splitting.  I am even +1 for setting a deadline on which date svn
-m will happen wether we are ready or not.
On 8/28/12 10:50 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote: