|
|
-
datanodes not sending reportprem yadav 2013-01-07, 09:23
Hi,
We have been running hadoop without much issues for some time. Today we has a problem where the datanodes has their disks full and the cluster stopped working. We fixed things, modified the config to add directories to dfs.data.dir and restarted. The hadoop version is 1.0.4. The issue is: the datanodes are not sending any block reports. No errors in the logs. The namenode shows there are 6 datanodes but never leaves the safe mode and the report ratio never goes up from 0.000. On one of the slave the jstack logs are: 2013-01-07 09:13:04 Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.5-b02 mixed mode): "Attach Listener" daemon prio=10 tid=0x00007f40f0766800 nid=0x6268 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "org.apache.hadoop.hdfs.server.datanode.DataBlockScanner@207a0c69" daemon prio=10 tid=0x00007f40e001a000 nid=0x5f52 waiting on condition [0x00007f40d9219000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:620) at java.lang.Thread.run(Thread.java:722) "IPC Server handler 2 on 50020" daemon prio=10 tid=0x00007f40e0017800 nid=0x5f51 waiting on condition [0x00007f40d931a000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000eedc95b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364) "IPC Server handler 1 on 50020" daemon prio=10 tid=0x00007f40e0015000 nid=0x5f50 waiting on condition [0x00007f40d941b000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000eedc95b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364) "IPC Server handler 0 on 50020" daemon prio=10 tid=0x00007f40e0013000 nid=0x5f4f waiting on condition [0x00007f40d951c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000eedc95b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364) "IPC Server listener on 50020" daemon prio=10 tid=0x00007f40e000a000 nid=0x5f4e runnable [0x00007f40d961d000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x00000000eeda0720> (a sun.nio.ch.Util$2) - locked <0x00000000eeda0710> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000eeda04d0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:439) "IPC Server Responder" daemon prio=10 tid=0x00007f40e0008800 nid=0x5f4d runnable [0x00007f40d971e000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x00000000eedc99e0> (a sun.nio.ch.Util$2) - locked <0x00000000eedc99d0> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000eedc97b0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.hadoop.ipc.Server$Responder.run(Server.java:605) "org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@75a61582" daemon prio=10 tid=0x00007f40e0007000 nid=0x5f4c runnable [0x00007f40d981f000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:226) - locked <0x00000000eeddb870> (a java.lang.Object) at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:99) - locked <0x00000000eeddb838> (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131) at java.lang.Thread.run(Thread.java:722) "DataNode: [/data/hadoopfs,/data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs]" daemon prio=10 tid=0x00007f40f0761000 nid=0x5f4b in Object.wait() [0x00007f40d9920000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000eeddb4f8> (a java.util.LinkedList) at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1023) - locked <0x00000000eeddb4f8> (a java.util.LinkedList) at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458) at java.lang.Thread.run(Thread.java:722) "pool-1-thread-1" prio=10 tid=0x00007f40f075d800 nid=0x5f4a runnable [0x00007f40d9a21000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrappe +
prem yadav 2013-01-07, 13:10
|