Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # general - Large feature development


+
Steve Loughran 2012-08-31, 17:07
+
Todd Lipcon 2012-09-01, 08:20
+
Steve Loughran 2012-09-02, 14:58
Copy link to this message
-
Re: Large feature development
Eli Collins 2012-09-02, 19:47
On Sun, Sep 2, 2012 at 7:58 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 1 September 2012 09:20, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>
>> Thanks for starting this thread, Steve. I think your points below are
>> good. I've snipped most of your comment and will reply inline to one
>> bit below:
>>
>> On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran
>> <[EMAIL PROTECTED]> wrote:
>>
>>
>> >
>> > How then do we get (a) more dev projects working and integrated by the
>> > current committers, and (b) a process in which people who are not yet
>> > contributors/committers can develop non-trivial changes to the project
>> in a
>> > way that it is done with the knowledge, support and mentorship of the
>> rest
>> > of the community?
>>
>>
> Both HDFS2 and MRv2 are in trunk, therefore I consider them successes.
>
>
>> Here's one proposal, making use of git as an easy way to allow
>> non-committers to "commit" code while still tracking development in
>> the usual places:
>>
>
> This is effectively what people do. I'm less worried about the code side of
> things than the integration and mentoring
>
>
>> - Upon anyone's request, we create a new "Version" tag in JIRA.
>>
>
> -1. There are enough versions. There is a "tag" field in JIRA for precisely
> this purpose
>
>
>> - The developers create an umbrella JIRA for the project, and file the
>> individual work items as subtasks (either up front, or as they are
>> developed if using a more iterative model)
>>
>
> as today
>
>
>> - On the umbrella, they add a pointer to a git branch to be used as
>> the staging area for the branch. As they develop each subtask, they
>> can use the JIRA to discuss the development like they would with a
>> normally committed JIRA, but when they feel it is ready to go (not
>> requiring a +1 from any committer) they commit to their git branch
>> instead of the SVN repo.
>>
>
> some integration w/ jenkins and pull testing would be good here
>
>
>> - When the branch is ready to merge, they can call a merge vote, which
>> requires +1 from 3 committers, same as a branch being proposed by an
>> existing committer. A committer would then use git-svn to merge their
>> branch commit-by-commit, or if it is less extensive, simply generate a
>> single big patch to commit into SVN.
>>
>> My thinking is that this would provide a low-friction way for people
>> to collaborate with the community and develop in the open, without
>> having to work closely with any committer to review every individual
>> subtask.
>>
>> Another alternative, if people are reluctant to use git, would be to
>> add a "sandbox/" repository inside our SVN, and hand out commit bit to
>> branches inside there without any PMC vote. Anyone interested in
>> contributing could request a branch in the sandbox, and be granted
>> access as soon as they get an apache SVN account.
>>
>>
> I don't see the technical issues with how the merge is done as the main
> problem.
>
> The barriers to getting your stuff in are
> 1. getting people to care enough to help develop the feature -mentorship,
> collaborative development.
> 2. getting incremental parts in to avoid the continual
> merge-regression-test hell that you go through if you are trying to keep a
> separate branch alive. It's not the technical aspects of the merge so much
> as the need to run all the hadoop tests and your own test suite, and track
> down whether a failure is a regression in -trunk or something in your code.
>
> Jun's patch is an example of this situation. We haven't seen the effort he
> and his colleagues have done with merge and test, but I'm confident it's
> been there. What they now have is a "big bang" class of patch which is so
> big that anyone reviewing it would have to spend a couple of weeks going
> through the codebase trying to understand it. Which as we all know means
> two weeks not doing all the things you are committed to doing.
>
> We know it's there, we know it's current -so how to use this as an exercise
> in something to pull in incrementally?

Jun's patches from HADOOP-8468 (which were developed on a private
github repo) are being pulled in incrementally into trunk, there's no
feature branch (which I think would have been a better route but at
least the current approach has not prevented some progress).

All the recent examples of features that I can think of that have been
developed upstream first at Apache on feature branches have gone well.

Thanks,
Eli
+
Arun C Murthy 2012-09-01, 19:47
+
Eli Collins 2012-09-02, 20:00
+
Arun Murthy 2012-09-02, 22:11
+
Todd Lipcon 2012-09-03, 01:12
+
Arun C Murthy 2012-09-03, 07:05
+
Todd Lipcon 2012-09-03, 07:31
+
Arun C Murthy 2012-09-03, 07:48
+
Arun C Murthy 2012-09-03, 07:22
+
Rajiv Chittajallu 2012-09-01, 21:29
+
Arun Murthy 2012-09-01, 22:33