Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> bringing the codebases back in line


Copy link to this message
-
Re: bringing the codebases back in line

On Oct 22, 2010, at 10:36 AM, Konstantin Shvachko wrote:

> Milind's point is valid, the PMC cannot demand or control what Yahoo,
> Facebook, et. al. run in their productions, or what Couldera sells  
> to their
> customers  AS  LONG  AS  it is within the Apache licensing  
> requirements.
>
> What Apache Hadoop can and should provide is a *steady* stream of base
> A-releases.
>
> I think that a single fact that we missed to release Hadoop 0.21  
> late last
> year got us into the state we are in now. As it let different Hadoop
> installations to diverge drastically from each other, whether it was  
> based
> on production or commercial reasons.
>
> Now that we have that, it would not be feasible or worthwhile to  
> find the
> common denominator based on the old 0.20 version, unless we want to  
> spend
> another year looking for it and diverging the individual  
> installations even
> more in the process.
>
> So the question imo is not "how we merge the cloudera and yahoo
> distributions", but when/how do we make the new 0.22 release.
> And how do we provide a steady release cycle after that.

+1

sanjay
>
> --Konstantin
>
> On Thu, Oct 21, 2010 at 9:29 PM, Milind A Bhandarkar
> <[EMAIL PROTECTED]>wrote:
>
>>>>
>>> right.. the trunk is not for production use.  I wasn't suggesting  
>>> that.
>>
>> So, what are you suggesting ? That Yahoo distribution of Hadoop  
>> should
>> *not* be the version we run on our production clusters ?
>>
>>>
>>> but the trunk is what will eventually become the next release.
>>
>>>
>>> Then someone in yahoo will have to decide if they are going to  
>>> move to
>>> rebase their production cluster to 0.21, or just continue back-
>>> porting
>> what
>>> they need to the version they are running on their clusters.
>>
>> Yes, that is what we do now. If there are committed patches in  
>> trunk that
>> do not scale for our needs, or break existing applications, or are  
>> deemed
>> not worth the efforts needed to backport, we do not include them in  
>> our
>> deployments, and therefore do not include in Yahoo distribution.
>>
>>>
>>> and if yahoo fixes a bug in their version, it would need to be
>>> forward-ported over to the current trunk. which will get harder and
>> harder
>>> as the paths diverge.
>>
>> Yes, indeed. So, care must be taken that paths do not diverge too  
>> much. I
>> have seen some cases where the bug fixes did not need to be forward  
>> ported,
>> because that piece of code was completely re-written in trunk.
>>
>>>
>>> I'm sure you've seen it happen on other projects when a major branch
>> lands
>>> on the trunk, and the amount of effort it takes to reconcile them.
>>
>> Yes. And that results in delayed releases. An unexpected benefit for
>> application developers was that they could spend time adding  
>> features to
>> their applications, rather than porting same applications from
>> release-to-release, and validating releases. So, it's not always bad.
>>
>> - Milind
>>
>>
>> --
>> Milind Bhandarkar
>> (mailto:[EMAIL PROTECTED])
>> (phone: 408-203-5213 W)
>>
>>
>>