-Re: YARN Features
Hitesh Shah 2013-03-12, 21:01
On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:
> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> I would like to ask other questions:
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?
It uses the generic filesystem apis from hadoop to a very large extent so it should work with any filesytem solution.
There are a couple of features which do depend on HDFS though - log aggregation for example ( collect all logs of all containers into a
central place ) that would need to be disabled. There may be some cases which I am may be unaware of. If you do see anything which
depends on HDFS, please do file jiras so that we can address the issue.
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
Yes - you are right - there would be no dependency on MapReduce.
The CapacityScheduler is the scheduling module used inside the ResourceManager ( which is YARN only ).
> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
There are a few others who are using Yarn in various scenarios - none who use it for their test infrastructure as far as I know.
The closest I can think of would be LinkedIn's use-case where they launch and monitor a bunch of services on a Yarn cluster.
( http://riccomini.name/posts/hadoop/2012-10-12-hortonworks-yarn-meetup/ might be of help )
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <[EMAIL PROTECTED]> wrote:
>> Answers regarding DistributedShell.
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> -- Hitesh
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>> The way YARN works is slightly different from what you describe above.