Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Datanode error


Copy link to this message
-
Datanode error
Hey guys,
I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working.
However my datanodes keep having the same errors, over and over.

I googled the problems and tried different flags (ex: -XX:MaxDirectMemorySize=2G)
and different configs (xceivers=8192) but could not solve it.

Does anyone know what is the problem and how can I solve it? (the stacktrace is at the end)

I am running:
Java 1.7
Hadoop 0.20.2
Hbase 0.90.6
Zoo 3.3.5

% top -> shows low load average (6% most of the time up to 60%), already considering the number of cpus
% vmstat -> shows no swap at all
% sar -> shows 75% idle cpu in the worst case

Hope you guys can help me.
Thanks in advance,
Pablo

2012-07-20 00:03:44,455 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /DN01:50010, dest:
/DN01:43516, bytes: 396288, op: HDFS_READ, cliID: DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, srvID: DS-798921853-DN01-50010-1328651609047, blockid: blk_914960691839012728_14061688, duration:
480061254006
2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to /DN01:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/DN01:50010 remote=/DN01:43516]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

2012-07-20 00:03:44,455 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/DN01:50010 remote=/DN01:43516]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

2012-07-20 00:12:11,949 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_4602445008578088178_5707787
2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-8916344806514717841_14081066 received exception java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/DN01:36634 remote=/DN03:50010]
2012-07-20 00:12:11,962 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/DN01:36634 remote=/DN03:50010]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at java.io.DataInputStream.readShort(DataInputStream.java:312)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:447)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183)
2012-07-20 00:12:20,670 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_7238561256016868237_3555939
2012-07-20 00:12:22,541 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-7028120671250332363_14081073 src: /DN03:50331 dest: /DN01:50010
2012-07-20 00:12:22,544 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-7028120671250332363_14081073 java.io.EOFException: while trying to read 65557 bytes
2012-07-20 00:12:22,544 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-7028120671250332363_14081073 Interrupted.
2012-07-20 00:12:22,544 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-7028120671250332363_14081073 terminating
2012-07-20 00:12:22,544 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-7028120671250332363_14081073 received exception java.io.EOFException: while trying to read 65557 bytes
2012-07-20 00:12:22,544 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 byt
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB