Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [DISCUSS] - YARN as a sub-project of Apache Hadoop


Copy link to this message
-
Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
On 25 July 2012 18:40, Arun C Murthy <[EMAIL PROTECTED]> wrote:

> Folks,
>
> It's been nearly a year since we merged Hadoop YARN into trunk and we have
> made several releases since.
>
> It's exciting to see various open-source communities (both in the ASF and
> externally) start to explore integration with YARN such as Apache Hama,
> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our
> hopes of making Apache Hadoop a much more general data processing platform
> (& storage, of course) and not tied to MapReduce alone for processing data.
> Furthermore, we already have people contributing interesting prototypes
> such as DistributedShell and PaaS on YARN.
>
> Given this, I think it would be useful to make YARN a sub-project of
> Apache Hadoop along with Common, HDFS & MapReduce. I believe this would
> help other communities realize that they could consider using YARN as a
> general-purpose resource management layer and help us enhance YARN beyond
> it's humble beginnings.
>
> Clearly, YARN and MapReduce are different enough that they can and will
> attract a diverse community.
>
> I'd like to clarify that this proposal *does not* mean we move the code
> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside
> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there
> would be *no changes* to release cycles - YARN would be co-released with
> Common, HDFS & MapReduce.
>
>

If the goal is to clearly partition the scheduling layer from the app
layer, and you think it helps isolate changes, then yes

+1

Forcing that strict hierarchy does ensure that you really do have a clean
separation of modules, and emphasises that it is more than just MapRed -as
people add more applications I can see that the separation would get their
needs addressed. Having a separate project could also allow Yarn to do a
point release in sync with those other projects, as well as do co-ordinated
releases with Hadoop itself.

It should also make clear that Yarn is designed to be a topology-aware
underpinning of a datacentre, interesting in its own right. Which reminds
me, I'd better get my topology stuff in.

-Steve