Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Hadoop 0.19.1


Copy link to this message
-
Re: Hadoop 0.19.1

On Feb 4, 2009, at 3:38 AM, Steve Loughran wrote:

> Sanjay Radia wrote:
> >
> > On Feb 2, 2009, at 4:23 PM, Konstantin Shvachko wrote:
> >
> >>
> >>  >  What do you recommend?
> >>
> >> In general. There may be people/organizations, which will not  
> compromise
> >> on the reduced functionality in favor of the stability, this is
> >> understandable.
> >> I would propose to create a separate (unofficial experimental)  
> branch,
> >> which
> >> would track changes like HADOOP-4379. The branch may later either  
> die
> >> when the
> >> main stream is fixed or be merged with the trunk if the changes  
> proved
> >> to be stable.
> >>
> >
> >
> > This is very a interesting suggestion.
> > Many in the team  have come to the conclusion that complex  
> projects like
> > append should be done on a separate branch in the first place and
> > integrated with trunk when the project is stable.
> >
>
> There's a lot to be said for branching; I'm also looking at git so I  
> can
> do my service lifecycle stuff under SCM properly.
>
> but the cost of merging can be high. I'd estimate 1 morning/week is
> spent updating my local SVN and then seeing that everything still  
> works.
> If hudson could both test the branches and test any merged branches,
> life would be better
>

I agree on the cost of merging.
When a project is branched,  after a while one can spend as much as  
30% of cycles merging
in changes.
But when a system is used in production to store data we cannot afford  
to have users loose their data.
The team at Yahoo had to scramble to recover the lost data, put in  
several emergency patches to deal with
the append code.

I am all for extending hudson testing for branches, but hudson  
testing, while helpful, will not be sufficient  for big
projects because hudson does not have a comprehensive set of tests.  
Each new release is tested significantly beyond the hudson tests.

For me the lesson is that large complex projects should be branched.
(This is how commercial software products are engineered).
There will increased cost to the project team, but over all, the  
community  will have more solid releases and the total cost to the  
community  in delivering the techology will be smaller.

sanjay
>
>
> The other problem is incompatible branches: the more branches you have
> live, the higher the merge cost.
>
> That said, Git promises wonderful things, and we ought to be able to  
> set
> up Apache support for git for people wanting to do their own branches
> -svn would still be the official SCM tool
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB