Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project


Copy link to this message
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Eric Baldeschwieler 2012-08-29, 17:42
Hi Tom,

> There are also Hadoop tools like distcp, Hadoop archives, Streaming,
> etc, which should go with MapReduce.

Good point.  I agree.

> The alternative would be to have a Common TLP,
> which we shouldn't necessarily dismiss, since more important than the
> size of the codebase is that there's a community to support the
> codebase, as there certainly is here.
I guess the question is who would want to be on that project?  I don't think the current bundle of stuff in common would form a good kernel for a community.  A lack of a coherent community for common has always been a problem with the project split IMO.  I could see folks deciding that they were going to build a community around a really good RPC stack, or some other chunk of common, but frankly I think it it premature to do that.  Proposals welcome of course, but I think the HDFS folks will want a copy of the RPC stuff in their project and most of the rest of the stuff in common is too small to merit a project and is more easily handled via duplication and then sorting it out / dead code elimination.

On Aug 29, 2012, at 10:30 AM, Tom White wrote:

> On Wed, Aug 29, 2012 at 5:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>>
>> On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote:
>>
>>> Chris, thanks for initiating the discussion.
>>
>> Likewise, thanks Chris!
>>
>>>
>>> IMO a pre-requisite to this is to figure out how we'll handle the following:
>>>
>>
>>
>> Good points - I'd recommend we keep Common and HDFS in the same project.
>
> That seems reasonable. The alternative would be to have a Common TLP,
> which we shouldn't necessarily dismiss, since more important than the
> size of the codebase is that there's a community to support the
> codebase, as there certainly is here. Having said that, a Common TLP
> lacks a clear 'mission' since it doesn't offer any standalone
> services. Also, it may diminish in utility over time if pieces are
> moved into HDFS, MapReduce and YARN.
>
>> Yes, MR/YARN will need some changes in Common occasionally, but core pieces like RPC have been maintained by HDFS folks over time anyway e.g. move to ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al.
>
> Does the work to use versioned protocol buffers for RPC mean that
> different releases of HDFS and MapReduce can work together yet? If
> not, this is something we should be working towards (although that
> shouldn't block a move to TLPs).
>
>>
>> We can move SequenceFile into MR if necessary and keep same package names for compatibility.
>
> There are also Hadoop tools like distcp, Hadoop archives, Streaming,
> etc, which should go with MapReduce.
>
> Cheers,
> Tom
>
>>
>> We should, of course, stop tweaking things in different projects in the same jira - we've been reasonably good at not doing that.
>>
>> Thoughts?
>>
>> Arun
>>
>>> * Where does common stuff lives?
>>> * What are the public interfaces of each project (towards the other projects)?
>>> * How do we do development/releases? In tandem? Separate? How this
>>> will work in practice, currently we are constantly tweaking things
>>> inter-projects, sometimes in the same JIRAs, sometimes in follow up
>>> JIRAs.
>>>
>>> Thoughts?
>>>
>>> Thxs.
>>>
>>> On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)
>>> <[EMAIL PROTECTED]> wrote:
>>>> [decided to minimize traffic and to simply put this in one thread]
>>>>
>>>> Hi Guys,
>>>>
>>>> See the recent discussion on these threads:
>>>>
>>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1
>>>> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx
>>>>
>>>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating
>>>> as a single project, that's masking separate communities that themselves are really
>>>> separate ASF projects.
>>>>
>>>> At the ASF, this has been a problem area called "umbrella" projects and over the years,
>>>> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of