-Re: task jvm bootstrapping via distributed cache
Stan Rosenberg 2013-01-17, 19:32
I am back with my original problem. I am trying to bootstrap child
JVM via -javaagent. I am doing what Harsh and Arun suggested, which
also agrees with the documentation.
In theory this should work, but it doesn't. Any ideas before I start
digging into the code? Thanks.
Here is the command I am using to test:
hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.2-cdh3u3.jar wordcount
I can see the following (relevant) properties set in job.xml,
The map tasks fail with the following stdout/stderr output, resp.,
Error occurred during initialization of VM
agent library failed to init: instrument
Error opening zip file or JAR manifest missing : ./foo.jar
This seems like the jar is not symlinked into the current working
directory of the child JVM; or perhaps the symlinking happens after
the child JVM starts?
On Fri, Aug 3, 2012 at 1:31 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> What Arun says would surely work.
> For instance, read this command:
> hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi
> -Dmapred.child.java.opts="-javaagent:./foo.jar" 1 1
> What this would do is merely take your passed -files jar (client-common) and
> symlink it into the JVM's working directory (the task's working directory)
> _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts
> that refer to this foo.jar under ./, then it would work as you expect it to,
> as the JVM is begun from that directory (its CWD).
> Do let us know if this solves it and also makes sense?
> On Fri, Aug 3, 2012 at 10:02 PM, Stan Rosenberg <[EMAIL PROTECTED]>
>> I don't believe the symlink is of help. The symlink is created in the
>> task's current working directory (cwd), but I don't know what cwd is
>> when I launch with 'hadoop jar ...'.
>> On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>> > Stan,
>> > You can ask TT to create a symlink to your jar shipped via DistCache:
>> > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
>> > That should give you what you want.
>> > hth,
>> > Arun
>> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote:
>> > Hi,
>> > I am seeking a way to leverage hadoop's distributed cache in order to
>> > ship jars that are required to bootstrap a task's jvm, i.e., before a
>> > map/reduce task is launched.
>> > As a concrete example, let's say that I need to launch with
>> > '-javaagent:/path/profiler.jar'. In theory, the task tracker is
>> > responsible for downloading cached files onto its local filesystem.
>> > However, the absolute path to a given cached file is not known a
>> > priori; however, we need the path in order to configure '-javaagent'.
>> > Is this currently possible with the distributed cache? If not, is the
>> > use case appealing enough to open a jira ticket?
>> > Thanks,
>> > stan
>> > --
>> > Arun C. Murthy
>> > Hortonworks Inc.
>> > http://hortonworks.com/
> Harsh J