Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: What else can be built on top of YARN.


+
Krishna Kishore Bonagiri 2013-05-29, 14:53
+
John Conwell 2013-05-29, 18:04
+
Viral Bajaria 2013-05-29, 18:41
+
Krishna Kishore Bonagiri 2013-05-30, 06:42
+
Jay Vyas 2013-05-30, 12:29
Copy link to this message
-
Re: What else can be built on top of YARN.


Historically, many applications/frameworks wanted to take advantage of just the resource management capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph), BSP(Hama), storm/s4 and even a simple tool like DistCp.

There are issues even with map-only jobs.
 - You have to fake key-value processing, periodic pings, key-value outputs
 - You are limited to map slot capacity in the cluster
 - The number of tasks is static, so you cannot grow and shrink your job
 - You are forced to sort data all the time (even though this has changed recently)
 - You are tied to faking things like OutputCommit even if you don't need to.

That's just for starters. I can definitely think harder and list more ;)

YARN lets you move ahead without those limitations.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote:

> Hi all,
>
> I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
>
> I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
>
> Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
>
> thanks,
> Rahul
>
>