|
|
-
Re: datanodes not sending reportprem yadav 2013-01-07, 13:10
Sorry. I should have sent it to the hadoop list.
We have got the issue resolved. The issue was: earlier hadoop was picking up <dfs.tmp.dir>/dfs/data as the dfs dir. Later when we specified the <dfs.data.dir> property in the config, hadoop did not append /dfs/data to the path and the datanode was looking for block in the <dfs.data.dir>. We changed the path to include /dfs/data and it worked fine. regards, ./Prem On Mon, Jan 7, 2013 at 2:53 PM, prem yadav <[EMAIL PROTECTED]> wrote: > Hi, > > We have been running hadoop without much issues for some time. Today we > has a problem where the datanodes has their disks full and the cluster > stopped working. > We fixed things, modified the config to add directories to dfs.data.dir > and restarted. > > The hadoop version is 1.0.4. > > The issue is: > the datanodes are not sending any block reports. No errors in the logs. > The namenode shows there are 6 datanodes but never leaves the safe mode and > the report ratio never goes up from 0.000. > > On one of the slave the jstack logs are: > > 2013-01-07 09:13:04 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.5-b02 mixed mode): > > "Attach Listener" daemon prio=10 tid=0x00007f40f0766800 nid=0x6268 waiting > on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > "org.apache.hadoop.hdfs.server.datanode.DataBlockScanner@207a0c69" daemon > prio=10 tid=0x00007f40e001a000 nid=0x5f52 waiting on condition > [0x00007f40d9219000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:620) > at java.lang.Thread.run(Thread.java:722) > > "IPC Server handler 2 on 50020" daemon prio=10 tid=0x00007f40e0017800 > nid=0x5f51 waiting on condition [0x00007f40d931a000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000eedc95b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364) > > "IPC Server handler 1 on 50020" daemon prio=10 tid=0x00007f40e0015000 > nid=0x5f50 waiting on condition [0x00007f40d941b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000eedc95b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364) > > "IPC Server handler 0 on 50020" daemon prio=10 tid=0x00007f40e0013000 > nid=0x5f4f waiting on condition [0x00007f40d951c000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000eedc95b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1364) > > "IPC Server listener on 50020" daemon prio=10 tid=0x00007f40e000a000 > nid=0x5f4e runnable [0x00007f40d961d000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) |