Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Thoughts about large feature dev branches


Copy link to this message
-
Re: Thoughts about large feature dev branches
Jonathan Hsieh 2012-09-06, 11:08
On Wed, Sep 5, 2012 at 4:43 PM, Jesse Yates <[EMAIL PROTECTED]> wrote:

> On Wed, Sep 5, 2012 at 3:58 PM, Elliott Clark <[EMAIL PROTECTED]
> >wrote:
>
> > - I would suggest that since hbase's code base moves so rapidly, a
> > rebased branch should probably be a requirement before merging.
> > Otherwise the merge will get pretty interesting for very long lived
> > branches.
> >
>
> IIRC when Todd was working on some large stuff for HDFS he was doing this
> in a feature branch every few days. Seriously helps with when things are
> actually finished in terms of rolling it back in.
>
> Using github to keep a constantly rebased version (every few days) would be
> a reasonble, super-low friction way of solving the problem for
> non-committers. Further, for big changes, it would ensure that if the
> people go away we aren't left with a bunch of dangling branches in the svn.
> Problem here is also establishing the 'master' branch in github, though
> that can be established on a case-by-case basis with the people involved.
>
> Have a work branch and then have the "committed" branch in github?
>  >
> > On Wed, Sep 5, 2012 at 11:38 AM, Jonathan Hsieh <[EMAIL PROTECTED]>
> wrote:
> > > This has been brought up in the past but we are here again.
> > >
> > > We have a few large features that are hanging out and having a hard
> time
> > > because trunk changes underneath it and in some cases because they are
> > > being worked by folks without a commit bit.   (ex: snapshots w/ Jesse
> and
> > > Matteo, and have some other potentially in the pipeline -- major
> > assignment
> >
>
> I'm generally opposed to doing feature branches for a variety of reasons
> (left behind functionality, hard to roll back in, difficulty of testing,
> etc) and further don't really feel its really necessary for the snapshot
> code given that the code doesn't touch all that much of the current
> codebase.
>
> I realize it is hard to split all the code up but it is also hard and time
consuming to review the design and implementation with a single 300k patch
that has many modes.

I'll post some other more concrete suggestions there.
> A lot of the pain with it right now is that the code has been broken into 5
> patches, making it hard to build a version of HBase that has snapshots 'in
> its current form'. This gets even worse as I'm planning on doing a bit more
> refactoring into a couple more patches to help make it more digestable
> (e.g. see latest patch for 3PC https://reviews.apache.org/r/6592/ which
> pulls out a lot of the coordination functionality)). This helps with
> reviews, etc, but makes it a bit of a pain for people who want to do
> advanced testing on the feature - hard to justify doing a lot of that work
> though as if the code is changing a lot, then testing doesn't make much
> sense.
>
> My hope with splitting parts of it was that we could peal off and commit
portions we are agree upon and have something solid and committed to build
the rest on top of.  If also keeps the subsystems decoupled which is a good
thing; this limits the scope of what needs to be understood, so that
reviewing/modifying it only requires localized knowledge instead of global
knowledge. This process would likely add tests at the internals layers and
make these subsystems testable.

>From a reviewer's point of view -- I'm going to ignore the things built on
top until we get the building blocks to a place where it is easy to follow.
> In terms of how the work is breaking down, with Matteo doing restore on top
> of the taking that I'm working on, his part clearly depends on the taking
> of snapshots. However, the filesystem layout hasn't changed at all in
> nearly the last two months, meaning the work can proceed pretty much
> independently (more or less).
>
> This is one case where the design docs did a really good job of defining
the structures ahead of time.  The spec is laid out and we just need to
confirm that the code puts things in those places. :)
> > > manager changes with Jimmy and possibly me,

There is another philosophy out there -- commit it but have the feature
disabled by default until it has been stabilized.  We've tried this in a
few places (wal compress, off heap cache, instant schema change) and these
features unfortunately haven't gone through the "stabilitzation".  Maybe
after Lars's push for more frequent releases proves it self,  we could go
into that camp.  I personally don't like the idea of features that are in
but doesn't always work or doesn't work in conjunction with other perfectly
reasonable seemingly orthogonal features.
be things we can't really do incrementally.   These should be rare.
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]