|
Stan Rosenberg
2012-07-30, 22:23
Stan Rosenberg
2012-07-31, 15:55
Arun C Murthy
2012-08-03, 06:39
Stan Rosenberg
2012-08-03, 16:32
Harsh J
2012-08-03, 17:31
Stan Rosenberg
2012-08-03, 18:32
Stan Rosenberg
2013-01-17, 19:32
Stan Rosenberg
2013-01-18, 01:28
Arun C Murthy
2012-08-03, 20:19
Stan Rosenberg
2012-08-03, 20:57
rahul p
2012-08-05, 05:35
|
-
task jvm bootstrapping via distributed cacheStan Rosenberg 2012-07-30, 22:23
Hi,
I am seeking a way to leverage hadoop's distributed cache in order to ship jars that are required to bootstrap a task's jvm, i.e., before a map/reduce task is launched. As a concrete example, let's say that I need to launch with '-javaagent:/path/profiler.jar'. In theory, the task tracker is responsible for downloading cached files onto its local filesystem. However, the absolute path to a given cached file is not known a priori; however, we need the path in order to configure '-javaagent'. Is this currently possible with the distributed cache? If not, is the use case appealing enough to open a jira ticket? Thanks, stan +
Stan Rosenberg 2012-07-30, 22:23
-
Re: task jvm bootstrapping via distributed cacheStan Rosenberg 2012-07-31, 15:55
I am guessing this is either a well-known problem or an edge case. In
any case, would it be a bad idea to designate predetermined output paths? E.g., DistributedCache.addCacheFileInto(uri, conf, outputPath) would attempt to copy the cached file into the specified path resolving to a task's local filesystem. Thanks, stan On Mon, Jul 30, 2012 at 6:23 PM, Stan Rosenberg <[EMAIL PROTECTED]> wrote: > Hi, > > I am seeking a way to leverage hadoop's distributed cache in order to > ship jars that are required to bootstrap a task's jvm, i.e., before a > map/reduce task is launched. > As a concrete example, let's say that I need to launch with > '-javaagent:/path/profiler.jar'. In theory, the task tracker is > responsible for downloading cached files onto its local filesystem. > However, the absolute path to a given cached file is not known a > priori; however, we need the path in order to configure '-javaagent'. > > Is this currently possible with the distributed cache? If not, is the > use case appealing enough to open a jira ticket? > > Thanks, > > stan +
Stan Rosenberg 2012-07-31, 15:55
-
Re: task jvm bootstrapping via distributed cacheArun C Murthy 2012-08-03, 06:39
Stan,
You can ask TT to create a symlink to your jar shipped via DistCache: http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache That should give you what you want. hth, Arun On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: > Hi, > > I am seeking a way to leverage hadoop's distributed cache in order to > ship jars that are required to bootstrap a task's jvm, i.e., before a > map/reduce task is launched. > As a concrete example, let's say that I need to launch with > '-javaagent:/path/profiler.jar'. In theory, the task tracker is > responsible for downloading cached files onto its local filesystem. > However, the absolute path to a given cached file is not known a > priori; however, we need the path in order to configure '-javaagent'. > > Is this currently possible with the distributed cache? If not, is the > use case appealing enough to open a jira ticket? > > Thanks, > > stan -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ +
Arun C Murthy 2012-08-03, 06:39
-
Re: task jvm bootstrapping via distributed cacheStan Rosenberg 2012-08-03, 16:32
Arun,
I don't believe the symlink is of help. The symlink is created in the task's current working directory (cwd), but I don't know what cwd is when I launch with 'hadoop jar ...'. Thanks, stan On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Stan, > > You can ask TT to create a symlink to your jar shipped via DistCache: > > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache > > That should give you what you want. > > hth, > Arun > > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: > > Hi, > > I am seeking a way to leverage hadoop's distributed cache in order to > ship jars that are required to bootstrap a task's jvm, i.e., before a > map/reduce task is launched. > As a concrete example, let's say that I need to launch with > '-javaagent:/path/profiler.jar'. In theory, the task tracker is > responsible for downloading cached files onto its local filesystem. > However, the absolute path to a given cached file is not known a > priori; however, we need the path in order to configure '-javaagent'. > > Is this currently possible with the distributed cache? If not, is the > use case appealing enough to open a jira ticket? > > Thanks, > > stan > > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > +
Stan Rosenberg 2012-08-03, 16:32
-
Re: task jvm bootstrapping via distributed cacheHarsh J 2012-08-03, 17:31
Stan,
What Arun says would surely work. For instance, read this command: hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi -files "share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0.jar#foo.jar" -Dmapred.child.java.opts="-javaagent:./foo.jar" 1 1 What this would do is merely take your passed -files jar (client-common) and symlink it into the JVM's working directory (the task's working directory) _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts that refer to this foo.jar under ./, then it would work as you expect it to, as the JVM is begun from that directory (its CWD). Do let us know if this solves it and also makes sense? On Fri, Aug 3, 2012 at 10:02 PM, Stan Rosenberg <[EMAIL PROTECTED]>wrote: > Arun, > > I don't believe the symlink is of help. The symlink is created in the > task's current working directory (cwd), but I don't know what cwd is > when I launch with 'hadoop jar ...'. > > Thanks, > > stan > > On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > Stan, > > > > You can ask TT to create a symlink to your jar shipped via DistCache: > > > > > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache > > > > That should give you what you want. > > > > hth, > > Arun > > > > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: > > > > Hi, > > > > I am seeking a way to leverage hadoop's distributed cache in order to > > ship jars that are required to bootstrap a task's jvm, i.e., before a > > map/reduce task is launched. > > As a concrete example, let's say that I need to launch with > > '-javaagent:/path/profiler.jar'. In theory, the task tracker is > > responsible for downloading cached files onto its local filesystem. > > However, the absolute path to a given cached file is not known a > > priori; however, we need the path in order to configure '-javaagent'. > > > > Is this currently possible with the distributed cache? If not, is the > > use case appealing enough to open a jira ticket? > > > > Thanks, > > > > stan > > > > > > -- > > Arun C. Murthy > > Hortonworks Inc. > > http://hortonworks.com/ > > > > > -- Harsh J +
Harsh J 2012-08-03, 17:31
-
Re: task jvm bootstrapping via distributed cacheStan Rosenberg 2012-08-03, 18:32
On Fri, Aug 3, 2012 at 1:31 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> > What this would do is merely take your passed -files jar (client-common) and > symlink it into the JVM's working directory (the task's working directory) > _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts > that refer to this foo.jar under ./, then it would work as you expect it to, > as the JVM is begun from that directory (its CWD). > The fact that jvm is executed relative to task's cwd completely escaped me :) Thanks for the clarification! It should definitely work. +
Stan Rosenberg 2012-08-03, 18:32
-
Re: task jvm bootstrapping via distributed cacheStan Rosenberg 2013-01-17, 19:32
Hi,
I am back with my original problem. I am trying to bootstrap child JVM via -javaagent. I am doing what Harsh and Arun suggested, which also agrees with the documentation. In theory this should work, but it doesn't. Any ideas before I start digging into the code? Thanks. Here is the command I am using to test: hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.2-cdh3u3.jar wordcount -files "core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar" -Dmapred.map.child.java.opts="-javaagent:./foo.jar=classes=.*" test1 output I can see the following (relevant) properties set in job.xml, mapred.cache.files=/user/srosenberg/.staging/job_201211061805_50132/files/core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar mapred.create.symlink=yes mapred.map.child.java.opts=-javaagent:./foo.jar=classes=.* The map tasks fail with the following stdout/stderr output, resp., Error occurred during initialization of VM agent library failed to init: instrument Error opening zip file or JAR manifest missing : ./foo.jar This seems like the jar is not symlinked into the current working directory of the child JVM; or perhaps the symlinking happens after the child JVM starts? On Fri, Aug 3, 2012 at 1:31 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Stan, > > What Arun says would surely work. > > For instance, read this command: > > hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi > -files > "share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0.jar#foo.jar" > -Dmapred.child.java.opts="-javaagent:./foo.jar" 1 1 > > What this would do is merely take your passed -files jar (client-common) and > symlink it into the JVM's working directory (the task's working directory) > _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts > that refer to this foo.jar under ./, then it would work as you expect it to, > as the JVM is begun from that directory (its CWD). > > Do let us know if this solves it and also makes sense? > > > On Fri, Aug 3, 2012 at 10:02 PM, Stan Rosenberg <[EMAIL PROTECTED]> > wrote: >> >> Arun, >> >> I don't believe the symlink is of help. The symlink is created in the >> task's current working directory (cwd), but I don't know what cwd is >> when I launch with 'hadoop jar ...'. >> >> Thanks, >> >> stan >> >> On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> > Stan, >> > >> > You can ask TT to create a symlink to your jar shipped via DistCache: >> > >> > >> > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache >> > >> > That should give you what you want. >> > >> > hth, >> > Arun >> > >> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: >> > >> > Hi, >> > >> > I am seeking a way to leverage hadoop's distributed cache in order to >> > ship jars that are required to bootstrap a task's jvm, i.e., before a >> > map/reduce task is launched. >> > As a concrete example, let's say that I need to launch with >> > '-javaagent:/path/profiler.jar'. In theory, the task tracker is >> > responsible for downloading cached files onto its local filesystem. >> > However, the absolute path to a given cached file is not known a >> > priori; however, we need the path in order to configure '-javaagent'. >> > >> > Is this currently possible with the distributed cache? If not, is the >> > use case appealing enough to open a jira ticket? >> > >> > Thanks, >> > >> > stan >> > >> > >> > -- >> > Arun C. Murthy >> > Hortonworks Inc. >> > http://hortonworks.com/ >> > >> > > > > > > -- > Harsh J +
Stan Rosenberg 2013-01-17, 19:32
-
Re: task jvm bootstrapping via distributed cacheStan Rosenberg 2013-01-18, 01:28
Hi,
As I suspected, cache files are symlinked after a child JVM is started: TaskRunner.setupWorkDir is being called from org.apache.hadoop.mapred.Child.main. This is unfortunate as it makes impossible to leverage distributed cache for the purpose of deploying JVM agents. I could submit a jira if there is any interest in getting this to work. Otherwise, I'll think of some other hacks and use a distributed scp as a last resort. Thanks, stan On Thu, Jan 17, 2013 at 2:32 PM, Stan Rosenberg <[EMAIL PROTECTED]> wrote: > Hi, > > I am back with my original problem. I am trying to bootstrap child > JVM via -javaagent. I am doing what Harsh and Arun suggested, which > also agrees with the documentation. > In theory this should work, but it doesn't. Any ideas before I start > digging into the code? Thanks. > > Here is the command I am using to test: > > hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.2-cdh3u3.jar wordcount > -files "core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar" > -Dmapred.map.child.java.opts="-javaagent:./foo.jar=classes=.*" test1 > output > > I can see the following (relevant) properties set in job.xml, > > mapred.cache.files=/user/srosenberg/.staging/job_201211061805_50132/files/core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar > mapred.create.symlink=yes > mapred.map.child.java.opts=-javaagent:./foo.jar=classes=.* > > The map tasks fail with the following stdout/stderr output, resp., > > Error occurred during initialization of VM > agent library failed to init: instrument > > Error opening zip file or JAR manifest missing : ./foo.jar > > This seems like the jar is not symlinked into the current working > directory of the child JVM; or perhaps the symlinking happens after > the child JVM starts? > > > > > On Fri, Aug 3, 2012 at 1:31 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> Stan, >> >> What Arun says would surely work. >> >> For instance, read this command: >> >> hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi >> -files >> "share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0.jar#foo.jar" >> -Dmapred.child.java.opts="-javaagent:./foo.jar" 1 1 >> >> What this would do is merely take your passed -files jar (client-common) and >> symlink it into the JVM's working directory (the task's working directory) >> _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts >> that refer to this foo.jar under ./, then it would work as you expect it to, >> as the JVM is begun from that directory (its CWD). >> >> Do let us know if this solves it and also makes sense? >> >> >> On Fri, Aug 3, 2012 at 10:02 PM, Stan Rosenberg <[EMAIL PROTECTED]> >> wrote: >>> >>> Arun, >>> >>> I don't believe the symlink is of help. The symlink is created in the >>> task's current working directory (cwd), but I don't know what cwd is >>> when I launch with 'hadoop jar ...'. >>> >>> Thanks, >>> >>> stan >>> >>> On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >>> > Stan, >>> > >>> > You can ask TT to create a symlink to your jar shipped via DistCache: >>> > >>> > >>> > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache >>> > >>> > That should give you what you want. >>> > >>> > hth, >>> > Arun >>> > >>> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: >>> > >>> > Hi, >>> > >>> > I am seeking a way to leverage hadoop's distributed cache in order to >>> > ship jars that are required to bootstrap a task's jvm, i.e., before a >>> > map/reduce task is launched. >>> > As a concrete example, let's say that I need to launch with >>> > '-javaagent:/path/profiler.jar'. In theory, the task tracker is >>> > responsible for downloading cached files onto its local filesystem. >>> > However, the absolute path to a given cached file is not known a >>> > priori; however, we need the path in order to configure '-javaagent'. >>> > >>> > Is this currently possible with the distributed cache? If not, is the >>> > use case appealing enough to open a jira ticket? +
Stan Rosenberg 2013-01-18, 01:28
-
Re: task jvm bootstrapping via distributed cacheArun C Murthy 2012-08-03, 20:19
Just do -javaagent:./profiler.jar?
On Aug 3, 2012, at 9:32 AM, Stan Rosenberg wrote: > Arun, > > I don't believe the symlink is of help. The symlink is created in the > task's current working directory (cwd), but I don't know what cwd is > when I launch with 'hadoop jar ...'. > > Thanks, > > stan > > On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> Stan, >> >> You can ask TT to create a symlink to your jar shipped via DistCache: >> >> http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache >> >> That should give you what you want. >> >> hth, >> Arun >> >> On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: >> >> Hi, >> >> I am seeking a way to leverage hadoop's distributed cache in order to >> ship jars that are required to bootstrap a task's jvm, i.e., before a >> map/reduce task is launched. >> As a concrete example, let's say that I need to launch with >> '-javaagent:/path/profiler.jar'. In theory, the task tracker is >> responsible for downloading cached files onto its local filesystem. >> However, the absolute path to a given cached file is not known a >> priori; however, we need the path in order to configure '-javaagent'. >> >> Is this currently possible with the distributed cache? If not, is the >> use case appealing enough to open a jira ticket? >> >> Thanks, >> >> stan >> >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ +
Arun C Murthy 2012-08-03, 20:19
-
Re: task jvm bootstrapping via distributed cacheStan Rosenberg 2012-08-03, 20:57
On Fri, Aug 3, 2012 at 4:19 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> Just do -javaagent:./profiler.jar? > Yep, that should work. Thanks! +
Stan Rosenberg 2012-08-03, 20:57
-
Re: task jvm bootstrapping via distributed cacherahul p 2012-08-05, 05:35
Hi Arun,
I am new to hadoop n big data. Can you help me start working on basics.my experience is into ETL and BI DWH. Rahul On Aug 4, 2012 12:33 AM, "Stan Rosenberg" <[EMAIL PROTECTED]> wrote: > Arun, > > I don't believe the symlink is of help. The symlink is created in the > task's current working directory (cwd), but I don't know what cwd is > when I launch with 'hadoop jar ...'. > > Thanks, > > stan > > On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > Stan, > > > > You can ask TT to create a symlink to your jar shipped via DistCache: > > > > > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache > > > > That should give you what you want. > > > > hth, > > Arun > > > > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: > > > > Hi, > > > > I am seeking a way to leverage hadoop's distributed cache in order to > > ship jars that are required to bootstrap a task's jvm, i.e., before a > > map/reduce task is launched. > > As a concrete example, let's say that I need to launch with > > '-javaagent:/path/profiler.jar'. In theory, the task tracker is > > responsible for downloading cached files onto its local filesystem. > > However, the absolute path to a given cached file is not known a > > priori; however, we need the path in order to configure '-javaagent'. > > > > Is this currently possible with the distributed cache? If not, is the > > use case appealing enough to open a jira ticket? > > > > Thanks, > > > > stan > > > > > > -- > > Arun C. Murthy > > Hortonworks Inc. > > http://hortonworks.com/ > > > > > +
rahul p 2012-08-05, 05:35
|