Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: What else can be built on top of YARN.


+
Krishna Kishore Bonagiri 2013-05-29, 14:53
+
John Conwell 2013-05-29, 18:04
+
Viral Bajaria 2013-05-29, 18:41
+
Krishna Kishore Bonagiri 2013-05-30, 06:42
+
Jay Vyas 2013-05-30, 12:29
Copy link to this message
-
Re: What else can be built on top of YARN.


Historically, many applications/frameworks wanted to take advantage of just the resource management capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph), BSP(Hama), storm/s4 and even a simple tool like DistCp.

There are issues even with map-only jobs.
 - You have to fake key-value processing, periodic pings, key-value outputs
 - You are limited to map slot capacity in the cluster
 - The number of tasks is static, so you cannot grow and shrink your job
 - You are forced to sort data all the time (even though this has changed recently)
 - You are tied to faking things like OutputCommit even if you don't need to.

That's just for starters. I can definitely think harder and list more ;)

YARN lets you move ahead without those limitations.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote:

> Hi all,
>
> I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
>
> I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
>
> Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
>
> thanks,
> Rahul
>
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB