Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: task jvm bootstrapping via distributed cache


Copy link to this message
-
Re: task jvm bootstrapping via distributed cache
Hi Stan,

If I understood your question... you want to ship a jar to the nodes where the task will run prior to the start of the task?

Not sure what it is you're trying to do...
Your example isn't  really clear.

See: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/filecache/DistributedCache.html

When you pull stuff out of the cache you get the path to the jar.
Or you should be able to get it.

I'm assuming you're doing this in your setup() method?

Can you give a better example, there may be a different way to handle this...

On Jul 31, 2012, at 3:50 PM, Stan Rosenberg <[EMAIL PROTECTED]> wrote:

> Forwarding to common-user to hopefully get more exposure...
>
>
> ---------- Forwarded message ----------
> From: Stan Rosenberg <[EMAIL PROTECTED]>
> Date: Tue, Jul 31, 2012 at 11:55 AM
> Subject: Re: task jvm bootstrapping via distributed cache
> To: [EMAIL PROTECTED]
>
>
> I am guessing this is either a well-known problem or an edge case.  In
> any case, would it be a bad idea to designate predetermined output
> paths?
> E.g., DistributedCache.addCacheFileInto(uri, conf, outputPath) would
> attempt to copy the cached file into the specified path resolving to a
> task's local filesystem.
>
> Thanks,
>
> stan
>
> On Mon, Jul 30, 2012 at 6:23 PM, Stan Rosenberg
> <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I am seeking a way to leverage hadoop's distributed cache in order to
>> ship jars that are required to bootstrap a task's jvm, i.e., before a
>> map/reduce task is launched.
>> As a concrete example, let's say that I need to launch with
>> '-javaagent:/path/profiler.jar'.  In theory, the task tracker is
>> responsible for downloading cached files onto its local filesystem.
>> However, the absolute path to a given cached file is not known a
>> priori; however, we need the path in order to configure '-javaagent'.
>>
>> Is this currently possible with the distributed cache? If not, is the
>> use case appealing enough to open a jira ticket?
>>
>> Thanks,
>>
>> stan
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB