Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Copy link to this message
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

I could not agree more with everything Andrew has written below. Things
have been running really quite smoothly for months (a year?) now. We've had
one rather small disagreement, that we're about to have cleared up, and now
suddenly we're talking about rearranging the whole thing. I still fail to
see how this could serve to help Hadoop.

Aaron T. Myers
Software Engineer, Cloudera

On Thu, Aug 30, 2012 at 7:11 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> As a direct Apache software product consumer and sometimes contributor, I
> also experienced firsthand the pain of the project splits. It was not
> possible to build an installable release. It may have been many days or
> weeks before that was cured by a re-merge. I gave up after burning too many
> hours on it, went back to the 1.0 code base, and came back only after the
> damage was repaired.
> It's also frustrating to hear, even if just one person's proposal, that we
> have spent months preparing to stabilize our next production deployment
> based on the 2.0 branch, with the expectation that it will be the new
> stable, but now maybe 0.23 will be the new stable. 0.23 is quite backwards
> in comparison and missing all of the critical HA HDFS work.
> This thread seems to be becoming a competition for which is the more
> radical proposal to snatch defeat from the jaws of success.
> These proposals seem to be made with a total lack of care for the end user.
> From my point of view, things were going reasonably well until suddenly
> there is this sudden turn into lunacy. I am positive this kind of
> "foundation" / PMC / project / administrivia tinkering is what will
> fragment or disband the Hadoop community of users and contributors, not
> disagreements between committers. A Hadoop competitor couldn't be happer.
> On Thu, Aug 30, 2012 at 1:12 PM, Konstantin Shvachko
> > On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)
> > <[EMAIL PROTECTED]> wrote:
> > > OK I lied and said I wouldn't reply :)
> >
> > Long thread. I just picked a random Chris's (as the initiator) email to
> > reply.
> >
> > Chris,
> > You are basically saying there's been a history of community problems
> > in Hadoop project,
> > and proposing a technical solution to split the project by replicating
> > the source base under three new names,
> > implying that this will solve the community problems we (the Hadoop
> > community) are facing.
> >
> > I see several issues.
> >
> > 1. There are other ways to split the project.
> > We essentially have a "natural" split of the project already in place.
> > Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk
> > are in a sense competing projects by themselves, with own contributors
> > and release cycles.
> >
> > 2. From technical (not community) viewpoint your "svn copy" is an ugly
> > approach,
> > as it creates a lot of code duplication and will result in a
> > maintenance nightmare or / and
> > will require many man-months to fix. My point is that you cannot
> > neglect "technical issues" when you solve community problems.
> >
> > 3. I am as skeptical as Todd that the community problems will be
> > solved by simply TLP-ing the three projects.
> > Two years ago Hadoop was in crises as vendors were producing their own
> > releases calling it Hadoop.
> > I think this was solved, but "poor community behavior" and contentions
> > remained, embrace them or not.
> >
> > 4. Having said the above, separating the projects seems reasonable.
> > (See timing though)
> > HDFS will inevitable have to inherit and maintain most of Common.
> > Totally understand frustration of people who just put a huge effort
> > into merging
> > the sources back under common root.
> >
> > 5. Timing is important.
> > Waiting until Hadoop 2 is stable as Arun suggested earlier would
> > probably be too long.
> > Doing it next week, without discussing and solving technical issue
> > listed in the thread would be premature.
> > I think Hadoop 0.23.3 release backed by Yahoo production has a