Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Large feature development


Copy link to this message
-
Re: Large feature development
On 1 September 2012 09:20, Todd Lipcon <[EMAIL PROTECTED]> wrote:

> Thanks for starting this thread, Steve. I think your points below are
> good. I've snipped most of your comment and will reply inline to one
> bit below:
>
> On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran
> <[EMAIL PROTECTED]> wrote:
>
>
> >
> > How then do we get (a) more dev projects working and integrated by the
> > current committers, and (b) a process in which people who are not yet
> > contributors/committers can develop non-trivial changes to the project
> in a
> > way that it is done with the knowledge, support and mentorship of the
> rest
> > of the community?
>
>
Both HDFS2 and MRv2 are in trunk, therefore I consider them successes.
> Here's one proposal, making use of git as an easy way to allow
> non-committers to "commit" code while still tracking development in
> the usual places:
>

This is effectively what people do. I'm less worried about the code side of
things than the integration and mentoring
> - Upon anyone's request, we create a new "Version" tag in JIRA.
>

-1. There are enough versions. There is a "tag" field in JIRA for precisely
this purpose
> - The developers create an umbrella JIRA for the project, and file the
> individual work items as subtasks (either up front, or as they are
> developed if using a more iterative model)
>

as today
> - On the umbrella, they add a pointer to a git branch to be used as
> the staging area for the branch. As they develop each subtask, they
> can use the JIRA to discuss the development like they would with a
> normally committed JIRA, but when they feel it is ready to go (not
> requiring a +1 from any committer) they commit to their git branch
> instead of the SVN repo.
>

some integration w/ jenkins and pull testing would be good here
> - When the branch is ready to merge, they can call a merge vote, which
> requires +1 from 3 committers, same as a branch being proposed by an
> existing committer. A committer would then use git-svn to merge their
> branch commit-by-commit, or if it is less extensive, simply generate a
> single big patch to commit into SVN.
>
> My thinking is that this would provide a low-friction way for people
> to collaborate with the community and develop in the open, without
> having to work closely with any committer to review every individual
> subtask.
>
> Another alternative, if people are reluctant to use git, would be to
> add a "sandbox/" repository inside our SVN, and hand out commit bit to
> branches inside there without any PMC vote. Anyone interested in
> contributing could request a branch in the sandbox, and be granted
> access as soon as they get an apache SVN account.
>
>
I don't see the technical issues with how the merge is done as the main
problem.

The barriers to getting your stuff in are
1. getting people to care enough to help develop the feature -mentorship,
collaborative development.
2. getting incremental parts in to avoid the continual
merge-regression-test hell that you go through if you are trying to keep a
separate branch alive. It's not the technical aspects of the merge so much
as the need to run all the hadoop tests and your own test suite, and track
down whether a failure is a regression in -trunk or something in your code.

Jun's patch is an example of this situation. We haven't seen the effort he
and his colleagues have done with merge and test, but I'm confident it's
been there. What they now have is a "big bang" class of patch which is so
big that anyone reviewing it would have to spend a couple of weeks going
through the codebase trying to understand it. Which as we all know means
two weeks not doing all the things you are committed to doing.

We know it's there, we know it's current -so how to use this as an exercise
in something to pull in incrementally?

-Steve
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB