Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> RE: yarn-site.xml and aux-services


Copy link to this message
-
Re: yarn-site.xml and aux-services
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the
time, and/or implementing a way to spawn/end a dedicated service
temporarily? I'd pick trying to implement such a thing than have my
containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <[EMAIL PROTECTED]> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:[EMAIL PROTECTED]]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <[EMAIL PROTECTED]>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <[EMAIL PROTECTED]> wrote:
>> Are there recommended conventions for adding additional code to a
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are used
>> to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: [EMAIL PROTECTED]
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. So,
>> they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>> <[EMAIL PROTECTED]>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is

Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB