Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Thoughts about large feature dev branches


Copy link to this message
-
Re: Thoughts about large feature dev branches
On Wed, Sep 5, 2012 at 3:58 PM, Elliott Clark <[EMAIL PROTECTED]>wrote:

> +1 on git, either on github or closer to the linux model with real
> distributed repos.
>
> - I've been using it for just about all of my development and it works
> pretty nicely.  I push everything to github as I'm working.  Then I
> squash commits and create a diff to post on jira.
>

I do the same, just locally. Solid model.
> - I would suggest that since hbase's code base moves so rapidly, a
> rebased branch should probably be a requirement before merging.
> Otherwise the merge will get pretty interesting for very long lived
> branches.
>

IIRC when Todd was working on some large stuff for HDFS he was doing this
in a feature branch every few days. Seriously helps with when things are
actually finished in terms of rolling it back in.

Using github to keep a constantly rebased version (every few days) would be
a reasonble, super-low friction way of solving the problem for
non-committers. Further, for big changes, it would ensure that if the
people go away we aren't left with a bunch of dangling branches in the svn.
Problem here is also establishing the 'master' branch in github, though
that can be established on a case-by-case basis with the people involved.

>
> On Wed, Sep 5, 2012 at 11:38 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
> > This has been brought up in the past but we are here again.
> >
> > We have a few large features that are hanging out and having a hard time
> > because trunk changes underneath it and in some cases because they are
> > being worked by folks without a commit bit.   (ex: snapshots w/ Jesse and
> > Matteo, and have some other potentially in the pipeline -- major
> assignment
>

I'm generally opposed to doing feature branches for a variety of reasons
(left behind functionality, hard to roll back in, difficulty of testing,
etc) and further don't really feel its really necessary for the snapshot
code given that the code doesn't touch all that much of the current
codebase.

A lot of the pain with it right now is that the code has been broken into 5
patches, making it hard to build a version of HBase that has snapshots 'in
its current form'. This gets even worse as I'm planning on doing a bit more
refactoring into a couple more patches to help make it more digestable
(e.g. see latest patch for 3PC https://reviews.apache.org/r/6592/ which
pulls out a lot of the coordination functionality)). This helps with
reviews, etc, but makes it a bit of a pain for people who want to do
advanced testing on the feature - hard to justify doing a lot of that work
though as if the code is changing a lot, then testing doesn't make much
sense.

In terms of how the work is breaking down, with Matteo doing restore on top
of the taking that I'm working on, his part clearly depends on the taking
of snapshots. However, the filesystem layout hasn't changed at all in
nearly the last two months, meaning the work can proceed pretty much
independently (more or less).
> > manager changes with Jimmy and possibly me,
>

This is a lot more high-touch with the codebase, making a branch (either in
sandbox or otherwise) more feasible.
>  HBASE-4120, HBASE-2600,
> > removing root)
>

Salesforce is planning on tackling at least the latter two in the next few
months, so this is something that we need to figure out :)
>  >
> > Though I wasn't around yet, it seems like this is what we did for
> > coprocs/security, probably for the 0.90 master.
> >
> http://search-hadoop.com/m/byzZYZMktx1/hbase+windows&subj=Re+Proposed+feature+branch+for+HBase+security
> >
> > Where the folks working on those features committers at the time?  What
> do
> > we do for contributions from folks who aren't committers yet?
> >
> > This was proposed over on hadoop-general by Todd -- what do you all think
> > about doing something like this for the major changes?  (Github seems
> > easiest, svn seems "more official").
> >
> > Here's one proposal, making use of git as an easy way to allow

Overall, this seems reasonable. I can imagine the work to merge back in
being a huge pain. It would be great to see if we can break down these big
changes into smaller patches and roll them in one at a time. Both in terms
of ease on a single committer as helping to ensure code quality of each
sub-piece; its easier to enforce good testing on smaller pieces and helps
with code reuse.

My comments above obviously contradict this a little bit - its a huge pain
to work on the end functionality when the sub-pieces that you are building
on shift due to code reviews. In the end it leads to a better foundation,
but can be headache to keep everything in sync.

The latter goes away a bit if we have a single branch with the majority of
the code then progressive commits to fix things, but still is terrible to
review (pot calling the kettle black here) that first massive code drop.

TL;DR prefer smaller, independently useful patches that build to the bigger
change. Its may not be possible for some features, but should make it
easier to review, roll in, and in the end merge the final change while
being more generally useful.

This seems a little excessive. It would be nice for the more 'official'
status this confers,  but seems to create more friction than its worth
(IMO).
TL;DR github with 'official' branches per umbrella JIRA seems a
low-friction way to do feature branches without the possiblitly of cruft in
the main repository. We should really be sure that we need a branch though
and still favoring smaller patches along the same branch for generally
useful features.

Jesse Yates
@jesse_yates
jyates.github.com