Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> task jvm bootstrapping via distributed cache


+
Stan Rosenberg 2012-07-30, 22:23
+
Stan Rosenberg 2012-07-31, 15:55
+
Arun C Murthy 2012-08-03, 06:39
+
Stan Rosenberg 2012-08-03, 16:32
+
Harsh J 2012-08-03, 17:31
+
Stan Rosenberg 2012-08-03, 18:32
+
Stan Rosenberg 2013-01-17, 19:32
Copy link to this message
-
Re: task jvm bootstrapping via distributed cache
Hi,

As I suspected, cache files are symlinked after a child JVM is
started:  TaskRunner.setupWorkDir is being called from
org.apache.hadoop.mapred.Child.main.
This is unfortunate as it makes impossible to leverage distributed
cache for the purpose of deploying JVM agents.  I could submit a jira
if there is any interest in getting this to work.
Otherwise, I'll think of some other hacks and use a distributed scp as
a last resort.

Thanks,

stan

On Thu, Jan 17, 2013 at 2:32 PM, Stan Rosenberg
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am back with my original problem.  I am trying to bootstrap child
> JVM via -javaagent.  I am doing what Harsh and Arun suggested, which
> also agrees with the documentation.
> In theory this should work, but it doesn't.  Any ideas before I start
> digging into the code? Thanks.
>
> Here is the command I am using to test:
>
> hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.2-cdh3u3.jar wordcount
> -files "core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar"
> -Dmapred.map.child.java.opts="-javaagent:./foo.jar=classes=.*" test1
> output
>
> I can see the following (relevant) properties set in job.xml,
>
> mapred.cache.files=/user/srosenberg/.staging/job_201211061805_50132/files/core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar
> mapred.create.symlink=yes
> mapred.map.child.java.opts=-javaagent:./foo.jar=classes=.*
>
> The map tasks fail with the following stdout/stderr output, resp.,
>
> Error occurred during initialization of VM
> agent library failed to init: instrument
>
> Error opening zip file or JAR manifest missing : ./foo.jar
>
> This seems like the jar is not symlinked into the current working
> directory of the child JVM; or perhaps the symlinking happens after
> the child JVM starts?
>
>
>
>
> On Fri, Aug 3, 2012 at 1:31 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> Stan,
>>
>> What Arun says would surely work.
>>
>> For instance, read this command:
>>
>> hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi
>> -files
>> "share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0.jar#foo.jar"
>> -Dmapred.child.java.opts="-javaagent:./foo.jar" 1 1
>>
>> What this would do is merely take your passed -files jar (client-common) and
>> symlink it into the JVM's working directory (the task's working directory)
>> _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts
>> that refer to this foo.jar under ./, then it would work as you expect it to,
>> as the JVM is begun from that directory (its CWD).
>>
>> Do let us know if this solves it and also makes sense?
>>
>>
>> On Fri, Aug 3, 2012 at 10:02 PM, Stan Rosenberg <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Arun,
>>>
>>> I don't believe the symlink is of help.  The symlink is created in the
>>> task's current working directory (cwd), but I don't know what cwd is
>>> when I launch with 'hadoop jar ...'.
>>>
>>> Thanks,
>>>
>>> stan
>>>
>>> On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>>> > Stan,
>>> >
>>> >  You can ask TT to create a symlink to your jar shipped via DistCache:
>>> >
>>> >
>>> > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
>>> >
>>> >  That should give you what you want.
>>> >
>>> > hth,
>>> > Arun
>>> >
>>> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote:
>>> >
>>> > Hi,
>>> >
>>> > I am seeking a way to leverage hadoop's distributed cache in order to
>>> > ship jars that are required to bootstrap a task's jvm, i.e., before a
>>> > map/reduce task is launched.
>>> > As a concrete example, let's say that I need to launch with
>>> > '-javaagent:/path/profiler.jar'.  In theory, the task tracker is
>>> > responsible for downloading cached files onto its local filesystem.
>>> > However, the absolute path to a given cached file is not known a
>>> > priori; however, we need the path in order to configure '-javaagent'.
>>> >
>>> > Is this currently possible with the distributed cache? If not, is the
>>> > use case appealing enough to open a jira ticket?
+
Arun C Murthy 2012-08-03, 20:19
+
Stan Rosenberg 2012-08-03, 20:57
+
rahul p 2012-08-05, 05:35
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB