Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Child JVM memory allocation / Usage


+
Hemanth Yamijala 2013-03-26, 04:42
+
nagarjuna kanamarlapudi 2013-03-26, 04:53
+
Hemanth Yamijala 2013-03-26, 15:53
+
Koji Noguchi 2013-03-26, 16:11
+
nagarjuna kanamarlapudi 2013-03-27, 08:28
Copy link to this message
-
Re: Child JVM memory allocation / Usage
Couple of things to check:

Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
interface ? You can look at an example at (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
That's what accepts the -D params on command line. Alternatively, you can
also set the same in the configuration object like this, in your launcher
code:

Configuration conf = new Configuration()

conf.set("mapred.create.symlink", "yes");
conf.set("mapred.cache.files",
"hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
conf.set("mapred.child.java.opts",
  "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./heapdump.hprof
-XX:OnOutOfMemoryError=./copy_dump.sh");
Second, the position of the arguments matters. I think the command should
be

hadoop jar -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ

Thanks
Hemanth
On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
[EMAIL PROTECTED]> wrote:

> Hi Hemanth/Koji,
>
> Seems the above script doesn't work for me.  Can u look into the following
> and suggest what more can I do
>
>
>  hadoop fs -cat /user/ims-b/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>
>
> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>  -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
>
> I am not able to see the heap dump at  /tmp/myheapdump_ims
>
>
>
> Erorr in the mapper :
>
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 17 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2734)
> at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> at java.util.ArrayList.add(ArrayList.java:351)
> at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
> ... 22 more
>
>
>
>
>
> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
> [EMAIL PROTECTED]> wrote:
>
>> Koji,
>>
>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>> with your script today !
>>
>> Hemanth
>>
>>
>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <[EMAIL PROTECTED]>wrote:
>>
>>> Create a dump.sh on hdfs.
>>>
>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof
>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>
>>> Run your job with
>>>
>>> -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>
>>> Koji
>>>
>>>
>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>
>>> > Hi,
>>> >
>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>> like I suspected, the dump goes to the current work directory of the task
>>> attempt as it executes on the cluster. This directory is cleaned up once
>>> the task is done. There are options to keep failed task files or task files
>>> matching a pattern. However, these are NOT retaining the current working
>>> directory. Hence, there is no option to get this from a cluster AFAIK.