Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [VOTE] Maintain a single committer list for the Hadoop project


Copy link to this message
-
Re: [VOTE] Maintain a single committer list for the Hadoop project
Arun C Murthy 2012-08-28, 23:12
On Aug 23, 2012, at 9:20 PM, Eli Collins wrote:

> Per this thread [1] should we have a single set of committers for the
> entire Hadoop project, ie all subprojects?

I feel like we need to have a wider discussion here.

This discussion started when a diverse set of folks working on YARN for a year and a half wanted their own identity and an acknowledgement of the fact that they are a distinct community. In retrospect, I went about convincing the wider Hadoop community about this in the wrong way. My apologies.

Upon reflection, I think Chris Mattman has convinced me that we have an even wider issue at hand and that the right way to a better place, not just for YARN, but for all of Hadoop, is to expedite the process of graduating Hadoop sub-projects into TLPs. This is a mere reflection of the fact that Hadoop is not a single community.

Historically there have been at least 2 communities (HDFS, MapReduce) under the Hadoop umbrella; and there now 3 (HDFS, MapReduce, YARN).
At least for the last 3 years, if not more, the overwhelming majority of contributors to Hadoop have focussed exclusively on one of the sub-projects. That is a clear indicator.
This is exactly the thinking behind graduating former sub-projects like HBase, Hive & Pig graduating, upon the nudge received by the Hadoop PMC from the Board.

The good news is that, in principle, most seem to agree on the need for Hadoop sub-projects to stand alone and the path to get there. It could lead to several great outcomes such as ensuring HDFS pays equal attention to HBase as MapReduce, YARN pays attention to projects beyond MapReduce etc. by not tying them together.

Rather than sweep this under the carpet, I feel we are better off acknowledging this.

This is very much in keeping with the way the ASF and the Board wants to see communities - small and focussed on a single project.

A meta or umbrella community like Hadoop leads to issues which are well documented and understood in the ASF, something experienced Apache Members like Chris Mattman have repeatedly pointed out.

It is also fair, per Chris Douglas, to set a reasonable time frame. After due consideration, I think doing this before hadoop-2 is declared stable (GA) is the most reasonable option. It gives us necessary headroom hereupon and will ensure we don't confuse users further by doing it post-fact hadoop-2. Let's discuss the mechanics, timelines etc. further.

Yes, this is hard work and there are several technical challenges. But, the ASF is all about communities and I'm sure we can solve these technical issues for a better long-term health of these distinct communities.

Thoughts?

thanks,
Arun