Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> bringing the codebases back in line


Copy link to this message
-
Re: bringing the codebases back in line
Milind's point is valid, the PMC cannot demand or control what Yahoo,
Facebook, et. al. run in their productions, or what Couldera sells to their
customers  AS  LONG  AS  it is within the Apache licensing requirements.

What Apache Hadoop can and should provide is a *steady* stream of base
A-releases.

I think that a single fact that we missed to release Hadoop 0.21 late last
year got us into the state we are in now. As it let different Hadoop
installations to diverge drastically from each other, whether it was based
on production or commercial reasons.

Now that we have that, it would not be feasible or worthwhile to find the
common denominator based on the old 0.20 version, unless we want to spend
another year looking for it and diverging the individual installations even
more in the process.

So the question imo is not "how we merge the cloudera and yahoo
distributions", but when/how do we make the new 0.22 release.
And how do we provide a steady release cycle after that.

--Konstantin

On Thu, Oct 21, 2010 at 9:29 PM, Milind A Bhandarkar
<[EMAIL PROTECTED]>wrote:

> >>
> > right.. the trunk is not for production use.  I wasn't suggesting that.
>
> So, what are you suggesting ? That Yahoo distribution of Hadoop should
> *not* be the version we run on our production clusters ?
>
> >
> > but the trunk is what will eventually become the next release.
>
> >
> > Then someone in yahoo will have to decide if they are going to move to
> > rebase their production cluster to 0.21, or just continue back-porting
> what
> > they need to the version they are running on their clusters.
>
> Yes, that is what we do now. If there are committed patches in trunk that
> do not scale for our needs, or break existing applications, or are deemed
> not worth the efforts needed to backport, we do not include them in our
> deployments, and therefore do not include in Yahoo distribution.
>
> >
> > and if yahoo fixes a bug in their version, it would need to be
> > forward-ported over to the current trunk. which will get harder and
> harder
> > as the paths diverge.
>
> Yes, indeed. So, care must be taken that paths do not diverge too much. I
> have seen some cases where the bug fixes did not need to be forward ported,
> because that piece of code was completely re-written in trunk.
>
> >
> > I'm sure you've seen it happen on other projects when a major branch
> lands
> > on the trunk, and the amount of effort it takes to reconcile them.
>
> Yes. And that results in delayed releases. An unexpected benefit for
> application developers was that they could spend time adding features to
> their applications, rather than porting same applications from
> release-to-release, and validating releases. So, it's not always bad.
>
> - Milind
>
>
> --
> Milind Bhandarkar
> (mailto:[EMAIL PROTECTED])
> (phone: 408-203-5213 W)
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB