Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> task jvm bootstrapping via distributed cache


+
Stan Rosenberg 2012-07-30, 22:23
+
Stan Rosenberg 2012-07-31, 15:55
+
Arun C Murthy 2012-08-03, 06:39
+
Stan Rosenberg 2012-08-03, 16:32
+
Harsh J 2012-08-03, 17:31
+
Stan Rosenberg 2012-08-03, 18:32
Copy link to this message
-
Re: task jvm bootstrapping via distributed cache
Hi,

I am back with my original problem.  I am trying to bootstrap child
JVM via -javaagent.  I am doing what Harsh and Arun suggested, which
also agrees with the documentation.
In theory this should work, but it doesn't.  Any ideas before I start
digging into the code? Thanks.

Here is the command I am using to test:

hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.2-cdh3u3.jar wordcount
-files "core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar"
-Dmapred.map.child.java.opts="-javaagent:./foo.jar=classes=.*" test1
output

I can see the following (relevant) properties set in job.xml,

mapred.cache.files=/user/srosenberg/.staging/job_201211061805_50132/files/core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar
mapred.create.symlink=yes
mapred.map.child.java.opts=-javaagent:./foo.jar=classes=.*

The map tasks fail with the following stdout/stderr output, resp.,

Error occurred during initialization of VM
agent library failed to init: instrument

Error opening zip file or JAR manifest missing : ./foo.jar

This seems like the jar is not symlinked into the current working
directory of the child JVM; or perhaps the symlinking happens after
the child JVM starts?
On Fri, Aug 3, 2012 at 1:31 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Stan,
>
> What Arun says would surely work.
>
> For instance, read this command:
>
> hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi
> -files
> "share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0.jar#foo.jar"
> -Dmapred.child.java.opts="-javaagent:./foo.jar" 1 1
>
> What this would do is merely take your passed -files jar (client-common) and
> symlink it into the JVM's working directory (the task's working directory)
> _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts
> that refer to this foo.jar under ./, then it would work as you expect it to,
> as the JVM is begun from that directory (its CWD).
>
> Do let us know if this solves it and also makes sense?
>
>
> On Fri, Aug 3, 2012 at 10:02 PM, Stan Rosenberg <[EMAIL PROTECTED]>
> wrote:
>>
>> Arun,
>>
>> I don't believe the symlink is of help.  The symlink is created in the
>> task's current working directory (cwd), but I don't know what cwd is
>> when I launch with 'hadoop jar ...'.
>>
>> Thanks,
>>
>> stan
>>
>> On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>> > Stan,
>> >
>> >  You can ask TT to create a symlink to your jar shipped via DistCache:
>> >
>> >
>> > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
>> >
>> >  That should give you what you want.
>> >
>> > hth,
>> > Arun
>> >
>> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote:
>> >
>> > Hi,
>> >
>> > I am seeking a way to leverage hadoop's distributed cache in order to
>> > ship jars that are required to bootstrap a task's jvm, i.e., before a
>> > map/reduce task is launched.
>> > As a concrete example, let's say that I need to launch with
>> > '-javaagent:/path/profiler.jar'.  In theory, the task tracker is
>> > responsible for downloading cached files onto its local filesystem.
>> > However, the absolute path to a given cached file is not known a
>> > priori; however, we need the path in order to configure '-javaagent'.
>> >
>> > Is this currently possible with the distributed cache? If not, is the
>> > use case appealing enough to open a jira ticket?
>> >
>> > Thanks,
>> >
>> > stan
>> >
>> >
>> > --
>> > Arun C. Murthy
>> > Hortonworks Inc.
>> > http://hortonworks.com/
>> >
>> >
>
>
>
>
> --
> Harsh J
+
Stan Rosenberg 2013-01-18, 01:28
+
Arun C Murthy 2012-08-03, 20:19
+
Stan Rosenberg 2012-08-03, 20:57
+
rahul p 2012-08-05, 05:35
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB