Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> MapReduce Child don't exit?


Copy link to this message
-
MapReduce Child don't exit?
hi all,

We are using hadoop-0.19.1 on about 200 nodes. We find there are lots of
slaves keep Child process even the job is done.

Here is an example, the process is running since "AUGEST 09"!
> 1000     24625     1  0 Aug09 ?        00:00:38 (...java... classpath)
> org.apache.hadoop.mapred.Child 127.0.0.1 55998
> attempt_200908081205_0054_r_000093_0 441920924
jstack output for the process is:
> 2009-11-12 14:58:59
> Full thread dump Java HotSpot(TM) Server VM (11.0-b15 mixed mode):
>
> "Attach Listener" daemon prio=10 tid=0x08168400 nid=0x457a waiting on
> condition [0x00000000..0x00000000]
>    java.lang.Thread.State: RUNNABLE
>
> "Thread-2" daemon prio=10 tid=0x08170400 nid=0x60f8 waiting for monitor
> entry [0xa33ad000..0xa33adfd0]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3085)
>         - waiting to lock <0xa84d12a8> (a
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054)
>         at
> org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
>         - locked <0xa84cba48> (a
> org.apache.hadoop.hdfs.DFSClient$LeaseChecker)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:209)
>         - locked <0xa84cba60> (a org.apache.hadoop.hdfs.DFSClient)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:264)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1413)
>         - locked <0xa84a1e00> (a org.apache.hadoop.fs.FileSystem$Cache)
>         at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:236)
>         at
> org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:221)
>         - locked <0xa84a26f0> (a
> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>
> "SIGTERM handler" daemon prio=10 tid=0x08176800 nid=0x60f6 in Object.wait()
> [0xa35ad000..0xa35ae0d0]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa84a26f0> (a
> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa84a26f0> (a
> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>         at java.lang.Thread.join(Thread.java:1196)
>         at
> java.lang.ApplicationShutdownHooks.run(ApplicationShutdownHooks.java:79)
>         at java.lang.Shutdown.runHooks(Shutdown.java:89)
>         at java.lang.Shutdown.sequence(Shutdown.java:133)
>         at java.lang.Shutdown.exit(Shutdown.java:178)
>         - locked <0xa4556020> (a java.lang.Class for java.lang.Shutdown)
>         at java.lang.Terminator$1.handle(Terminator.java:35)
>         at sun.misc.Signal$1.run(Signal.java:195)
>         at java.lang.Thread.run(Thread.java:619)
>
> "Comm thread for attempt_200908081205_0054_r_000093_0" daemon prio=10
> tid=0x083f0000 nid=0x6049 waiting for monitor entry [0xa35fe000..0xa35ff050]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.Shutdown.exit(Shutdown.java:178)
>         - waiting to lock <0xa4556020> (a java.lang.Class for
> java.lang.Shutdown)
>         at java.lang.Runtime.exit(Runtime.java:90)
>         at java.lang.System.exit(System.java:906)
>         at org.apache.hadoop.mapred.Task$1.run(Task.java:430)
>         at java.lang.Thread.run(Thread.java:619)
>
> "Thread for syncLogs" daemon prio=10 tid=0xa39cc800 nid=0x6041 waiting for
> monitor entry [0xa38a3000..0xa38a3fd0]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.Shutdown.exit(Shutdown.java:178)
>         - waiting to lock <0xa4556020> (a java.lang.Class for
> java.lang.Shutdown)
>         at java.lang.Runtime.exit(Runtime.java:90)
>         at java.lang.System.exit(System.java:906)
>         at org.apache.hadoop.mapred.Child$1.run(Child.java:84)
>
> "Low Memory Detector" daemon prio=10 tid=0x0811c800 nid=0x603e runnable
It seems the process is blocked by DFS client. Anyone tell me how to avoid
it?

Best Regards,

Ted Xu
+
Jason Venner 2009-11-17, 06:30
+
Ted Xu 2009-11-18, 01:54