Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project


Copy link to this message
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Hi Alejandro,

On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote:

> Chris, thanks for initiating the discussion.

No probs!

>
> IMO a pre-requisite to this is to figure out how we'll handle the following:
>

To be honest, I don't think any of the below are prereqs. They are technical
issues that can be dealt with post facto of just SVN copy'ing hadoop as it
stands today per my SVN commands into each of the new TLPs and then
using that as a starting point for doing the below, as part of the natural evolution
of the project code.

That being said, if I had to guess what the TLPs would do to address the below
once they are created:

> * Where does common stuff lives?

This usually happens over time and depending on how often things release,
and other things cited else-threads, and else-discussions over the past years
in Hadoop. You guys clearly have a good handle on things like this.

I would just encourage the subsequent TLPs to not worry about doing everything
perfectly and to realize that if you start out with the same code base, you can selectively
and then iteratively just make things more clean, refactored, and the answer to questions
like this will happen naturally during that evolution.

> * What are the public interfaces of each project (towards the other projects)?

This is something that each distinct community can answer once they are bootstrapped
as TLPs. You can decide what portion of the code is really under charter and then work
as a community to figure this out. Sorry I can't be more specific than that.

> * How do we do development/releases? In tandem? Separate?

In tandem across communities never really works. Releases should occur separately, per
community and TLP, on their own schedule. Code that depends on other projects either
has to wait for those communities/TLPs/projects to fix things, or add new features, or
whatever, or insulate, and keep the fixes locally in your project's SVN until those fixes
can be pushed upstream, and included in the other communities releases, etc.

Ask yourself this. If you guys have a dependency on e.g., Tomcat, and there is some critical
bug or new feature you want in Tomcat, how would you deal with that? I would posit the same
way that you could deal with this situation. Keep the fix to Tomcat locally in your project;
work to get that fix upstream and included in some subsequent Tomcat release, etc.

> How this
> will work in practice, currently we are constantly tweaking things
> inter-projects, sometimes in the same JIRAs, sometimes in follow up
> JIRAs.

Technically you are doing that that, but community wise, it's not working out, and hasn't
really been working for years. I've been around Hadoop since its inception (I was a Nutch
committer before Hadoop existed), and though it's been hugely successful, and really
awesome and super great (congrats, everyone, BTW!), the community issues have always
cropped up b/c it's one big huge umbrella project and that doesn't work at Apache.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [EMAIL PROTECTED]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++