Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: YARN Features


+
Hitesh Shah 2013-03-12, 18:47
+
Ioan Zeng 2013-03-12, 19:26
Copy link to this message
-
Re: YARN Features
Answers inline.

-- Hitesh

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
>
> I would like to ask other questions:
>
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

It uses the generic filesystem apis from hadoop to a very large extent so it should work with any filesytem solution.
There are a couple of features which do depend on HDFS though - log aggregation for example ( collect all logs of all containers into a
central place ) that would need to be disabled. There may be some cases which I am may be unaware of. If you do see anything which
depends on HDFS, please do file jiras so that we can address the issue.

>
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
>

Yes - you are right - there would be no dependency on MapReduce.
The CapacityScheduler is the scheduling module used inside the ResourceManager ( which is YARN only ).
> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
>

There are a few others who are using Yarn in various scenarios - none who use it for their test infrastructure as far as I know.
The closest I can think of would be LinkedIn's use-case where they launch and monitor a bunch of services on a Yarn cluster.  
( http://riccomini.name/posts/hadoop/2012-10-12-hortonworks-yarn-meetup/ might be of help )
> Thanks,
> Ioan
>
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <[EMAIL PROTECTED]> wrote:
>> Answers regarding DistributedShell.
>>
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>>
>> -- Hitesh
>>
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>>
>>>
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>>
>>
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>>
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>>
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>>
>>
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>>
>>>
>>
>> The way YARN works is slightly different from what you describe above.
+
Hitesh Shah 2013-03-12, 18:38
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB