Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] - YARN as a sub-project of Apache Hadoop


Copy link to this message
-
Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Thomas Graves 2012-07-26, 20:07
+1 for the idea.  I think separating the framework from the MR application
makes sense.  

Tom
On 7/25/12 8:40 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:

> Folks,
>
> It's been nearly a year since we merged Hadoop YARN into trunk and we have
> made several releases since.
>
> It's exciting to see various open-source communities (both in the ASF and
> externally) start to explore integration with YARN such as Apache Hama, Apache
> Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of
> making Apache Hadoop a much more general data processing platform (& storage,
> of course) and not tied to MapReduce alone for processing data. Furthermore,
> we already have people contributing interesting prototypes such as
> DistributedShell and PaaS on YARN.
>
> Given this, I think it would be useful to make YARN a sub-project of Apache
> Hadoop along with Common, HDFS & MapReduce. I believe this would help other
> communities realize that they could consider using YARN as a general-purpose
> resource management layer and help us enhance YARN beyond it's humble
> beginnings.
>
> Clearly, YARN and MapReduce are different enough that they can and will
> attract a diverse community.
>
> I'd like to clarify that this proposal *does not* mean we move the code base
> out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside
> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there
> would be *no changes* to release cycles - YARN would be co-released with
> Common, HDFS & MapReduce.
>
> Thoughts?
>
> ----
>
> What does it mean to the Hadoop developer community?
>
> # Project dependencies
>
> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN &
> MapReduce. As today, the dependencies *do not change*:
> - Common is the base
> - HDFS depends only on Common
> - YARN depends only on Common & HDFS
> - MapReduce depends on Common, HDFS & YARN.
>
> # Jira & Mailing lists
>
> We would have a separate YARN jira project and a yarn-dev@ mailing list.
>
> We already use separate MAPREDUCE jira issues for making changes to YARN
> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce
> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a
> change.
>
> # Subversion
>
> Not much at all! YARN has, since the beginning, been developed with the
> understanding that it is very independent of MapReduce and the code-bases are
> already independent i.e. hadoop-mapreduce-project/hadoop-yarn and
> hadoop-mapreduce-project/hadoop-mapreduce-client.
>
> Essentially the change would be:
> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn
> ... and the necessary, albeit small, changes to our maven build
> infrastructure.
>
> # Release Cycles
>
> No changes.
>
> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.
>
> thanks,
> Arun