Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: yarn-site.xml and aux-services


Copy link to this message
-
RE: yarn-site.xml and aux-services
John Lilley 2013-09-05, 20:08
https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 05, 2013 12:14 PM
To: <[EMAIL PROTECTED]>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <[EMAIL PROTECTED]> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <[EMAIL PROTECTED]>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <[EMAIL PROTECTED]> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:[EMAIL PROTECTED]]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <[EMAIL PROTECTED]>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.

Harsh J