Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Copy link to this message
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Aug 30, 2012, at 3:12 AM, Konstantin Shvachko wrote:

> 2. From technical (not community) viewpoint your "svn copy" is an ugly
> approach,
> as it creates a lot of code duplication and will result in a
> maintenance nightmare or / and
> will require many man-months to fix. My point is that you cannot
> neglect "technical issues" when you solve community problems.

Agreed Konstantin. I don't think Chris was being serious here - it was merely *one* way forward.

There are, easily, better ways to solve this.

The big cross-project dependency is IPC/RPC, Security and Metrics2. Some others are the network topology apis etc. They need to be marked Public/Stable. We need to maintain compatibility across a major (stable) release anyway. This is true for every other Public/Stable api.

So, *technically*, the requirements are:
a) Ensure projects only use Public/Stable apis.
b) Maintain compatibility for Public/Stable apis within a major release.
c) Clearly key components like IPC, Metrics2, Secuirty etc. *should* be marked stable by the time the ersatz hadoop-2 codebase is declared 'stable'.

None of these seem like the fashionably *scary* technical issues some people are using to justify blocking the way forward.

And, no, YARN/MR aren't the only ones downstream projects in this mix - HBase for e.g. uses hadoop metrics2 and our security apis. We need to support compatibility for HBase anyway. There are several other projects in the same boat. Pig/Hive need FileSystem, Security & MR apis. This is just *reality* being at the bottom of the stack.

Yes, there is work left - but that work is something we need to do with or without the split.

Furthermore, yes, the previous split/unsplit was painful. However, beyond that, we have made progress across several dimensions which should make this one smoother:
a) Mavenization has helped a *lot*.
b) Unlike the previous attempt, HDFS2 & YARN (v/s HDFS1 & MR1) no longer share the same run-time scripts etc.
c) We have been fairly good at following through on our stability/visibility guarantees on APIs.

As a result, I don't buy the *this is technically impossibleā€¢ argument.

As Konstantin suggested, we could spend the next few weeks/months preparing.
Even after the split we would be in alpha/beta stage where-by we can recover from mistakes at the cost of a few extra HDFS alpha/beta releases for the sake of MR/YARN projects which seems like an acceptable cost given that there are several volunteers to RM releases.

Last, not least, the previous split failed because the overall community did not invest in ensuring it's success. It's clearly *not* the case this time around. I'm very confident of that.