Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: yarn-site.xml and aux-services


+
Harsh J 2013-08-23, 17:00
Copy link to this message
-
Re: yarn-site.xml and aux-services
Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <[EMAIL PROTECTED]> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <[EMAIL PROTECTED]>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <[EMAIL PROTECTED]> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:[EMAIL PROTECTED]]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <[EMAIL PROTECTED]>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <[EMAIL PROTECTED]> wrote:
>>> Are there recommended conventions for adding additional code to a

Harsh J
+
John Lilley 2013-09-05, 20:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB