Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - RE: yarn-site.xml and aux-services


Copy link to this message
-
Re: yarn-site.xml and aux-services
Harsh J 2013-09-04, 06:05
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the
time, and/or implementing a way to spawn/end a dedicated service
temporarily? I'd pick trying to implement such a thing than have my
containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <[EMAIL PROTECTED]> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:[EMAIL PROTECTED]]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <[EMAIL PROTECTED]>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <[EMAIL PROTECTED]> wrote:
>> Are there recommended conventions for adding additional code to a
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are used
>> to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: [EMAIL PROTECTED]
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. So,
>> they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>> <[EMAIL PROTECTED]>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is

Harsh J