Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [DISCUSS] - YARN as a sub-project of Apache Hadoop


Copy link to this message
-
Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
On 25 July 2012 18:40, Arun C Murthy <[EMAIL PROTECTED]> wrote:

> Folks,
>
> It's been nearly a year since we merged Hadoop YARN into trunk and we have
> made several releases since.
>
> It's exciting to see various open-source communities (both in the ASF and
> externally) start to explore integration with YARN such as Apache Hama,
> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our
> hopes of making Apache Hadoop a much more general data processing platform
> (& storage, of course) and not tied to MapReduce alone for processing data.
> Furthermore, we already have people contributing interesting prototypes
> such as DistributedShell and PaaS on YARN.
>
> Given this, I think it would be useful to make YARN a sub-project of
> Apache Hadoop along with Common, HDFS & MapReduce. I believe this would
> help other communities realize that they could consider using YARN as a
> general-purpose resource management layer and help us enhance YARN beyond
> it's humble beginnings.
>
> Clearly, YARN and MapReduce are different enough that they can and will
> attract a diverse community.
>
> I'd like to clarify that this proposal *does not* mean we move the code
> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside
> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there
> would be *no changes* to release cycles - YARN would be co-released with
> Common, HDFS & MapReduce.
>
>

If the goal is to clearly partition the scheduling layer from the app
layer, and you think it helps isolate changes, then yes

+1

Forcing that strict hierarchy does ensure that you really do have a clean
separation of modules, and emphasises that it is more than just MapRed -as
people add more applications I can see that the separation would get their
needs addressed. Having a separate project could also allow Yarn to do a
point release in sync with those other projects, as well as do co-ordinated
releases with Hadoop itself.

It should also make clear that Yarn is designed to be a topology-aware
underpinning of a datacentre, interesting in its own right. Which reminds
me, I'd better get my topology stuff in.

-Steve
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB