Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - datanodes not sending report


Copy link to this message
-
datanodes not sending report
prem yadav 2013-01-07, 09:23
Hi,

We have been running hadoop without much issues for some time. Today we has
a problem where the datanodes has their disks full and the cluster stopped
working.
We fixed things, modified the config to add directories to dfs.data.dir and
restarted.

The hadoop version is 1.0.4.

The issue is:
the datanodes are not sending any block reports. No errors in the logs. The
namenode shows there are 6 datanodes but never leaves the safe mode and the
report ratio never goes up from 0.000.

On one of the slave the jstack logs are:

2013-01-07 09:13:04
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.5-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f40f0766800 nid=0x6268 waiting
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"org.apache.hadoop.hdfs.server.datanode.DataBlockScanner@207a0c69" daemon
prio=10 tid=0x00007f40e001a000 nid=0x5f52 waiting on condition
[0x00007f40d9219000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:620)
at java.lang.Thread.run(Thread.java:722)

"IPC Server handler 2 on 50020" daemon prio=10 tid=0x00007f40e0017800
nid=0x5f51 waiting on condition [0x00007f40d931a000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000eedc95b8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364)

"IPC Server handler 1 on 50020" daemon prio=10 tid=0x00007f40e0015000
nid=0x5f50 waiting on condition [0x00007f40d941b000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000eedc95b8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364)

"IPC Server handler 0 on 50020" daemon prio=10 tid=0x00007f40e0013000
nid=0x5f4f waiting on condition [0x00007f40d951c000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000eedc95b8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364)

"IPC Server listener on 50020" daemon prio=10 tid=0x00007f40e000a000
nid=0x5f4e runnable [0x00007f40d961d000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000eeda0720> (a sun.nio.ch.Util$2)
- locked <0x00000000eeda0710> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000eeda04d0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:439)

"IPC Server Responder" daemon prio=10 tid=0x00007f40e0008800 nid=0x5f4d
runnable [0x00007f40d971e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000eedc99e0> (a sun.nio.ch.Util$2)
- locked <0x00000000eedc99d0> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000eedc97b0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at org.apache.hadoop.ipc.Server$Responder.run(Server.java:605)

"org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@75a61582" daemon
prio=10 tid=0x00007f40e0007000 nid=0x5f4c runnable [0x00007f40d981f000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:226)
- locked <0x00000000eeddb870> (a java.lang.Object)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:99)
- locked <0x00000000eeddb838> (a java.lang.Object)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
at java.lang.Thread.run(Thread.java:722)

"DataNode:
[/data/hadoopfs,/data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs]" daemon
prio=10 tid=0x00007f40f0761000 nid=0x5f4b in Object.wait()
[0x00007f40d9920000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000eeddb4f8> (a java.util.LinkedList)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1023)
- locked <0x00000000eeddb4f8> (a java.util.LinkedList)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
at java.lang.Thread.run(Thread.java:722)

"pool-1-thread-1" prio=10 tid=0x00007f40f075d800 nid=0x5f4a runnable
[0x00007f40d9a21000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrappe
+
prem yadav 2013-01-07, 13:10