|
|
-
Re: YARN FeaturesHitesh Shah 2013-03-12, 21:01
Answers inline.
-- Hitesh On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote: > Another evaluation criteria was the community support of the framework > which I rate now as very good :) > > I would like to ask other questions: > > I have seen YARN or MR used only in the context of HDFS. Would it be > possible to keep all YARN features without using it in relation with > HDFS (with no HDFS installed)? It uses the generic filesystem apis from hadoop to a very large extent so it should work with any filesytem solution. There are a couple of features which do depend on HDFS though - log aggregation for example ( collect all logs of all containers into a central place ) that would need to be disabled. There may be some cases which I am may be unaware of. If you do see anything which depends on HDFS, please do file jiras so that we can address the issue. > > You mentioned the CapacityScheduler. Does this require MapReduce? or > is it included in YARN? I understood that MRv2 is just an application > built over the YARN framework. For our use case we don't need MR. > Yes - you are right - there would be no dependency on MapReduce. The CapacityScheduler is the scheduling module used inside the ResourceManager ( which is YARN only ). > For a better understanding of my questions regarding the Distributed > Shell. We intend to use YARN for a distributed automated test > environment which will execute set of test suites for specific builds > in parallel. Do you know about similar usages of YARN or MR, maybe > case studies? > There are a few others who are using Yarn in various scenarios - none who use it for their test infrastructure as far as I know. The closest I can think of would be LinkedIn's use-case where they launch and monitor a bunch of services on a Yarn cluster. ( http://riccomini.name/posts/hadoop/2012-10-12-hortonworks-yarn-meetup/ might be of help ) > Thanks, > Ioan > > On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <[EMAIL PROTECTED]> wrote: >> Answers regarding DistributedShell. >> >> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture. >> >> -- Hitesh >> >> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote: >> >>> >>> Another point I would like to evaluate is the Distributed Shell example usage. >>> Our use case is to start different scripts on a grid. Once a node has >>> finished a script a new script has to be started on it. A report about >>> the scripts execution has to be provided. in case a node has failed to >>> execute a script it should be re-executed on a different node. Some >>> scripts are Windows specific other are Unix specific and have to be >>> executed on a node with a specific OS. >>> >> >> The current implementation of distributed shell is effectively a piece of example code to help >> folks write more complex applications. It simply supports launching a script on a given number >> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures >> and simply reports a success/failure based on the no. of failures in running the script. >> >> Based on your use case, it should be easy enough to build on the example code to handle the features that >> you require. >> >> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA >> for this feature request with some details about your use-case. >> >> >>> The question is: >>> Would it be feasible to adapt the example "Distributed Shell" >>> application to have the above features? >>> If yes how could I run some specific scripts only on a specific OS? Is >>> this the ResourceManager responsability? What happens if there is no >>> Windows node for example in the grid but in the queue there is a >>> Windows script? >>> How to re-execute failed scripts? Does it have to be implemented by >>> custom code, or is it a built in feature of YARN? >>> >>> >> >> The way YARN works is slightly different from what you describe above. |