Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Terribly long HDFS timeouts while appending to HLog


Copy link to this message
-
Terribly long HDFS timeouts while appending to HLog
Varun Sharma 2012-11-07, 09:43
Hi,

I am seeing extremely long HDFS timeouts - and this seems to be associated
with the loss of a DataNode. Here is the RS log:

12/11/07 02:17:45 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
exception  for block blk_2813460962462751946_78454java.io.IOException: Bad
response 1 for block blk_2813460962462751946_78454 from datanode
10.31.190.107:9200
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3084)

12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
blk_2813460962462751946_78454 bad datanode[1] 10.31.190.107:9200
12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
blk_2813460962462751946_78454 in pipeline 10.31.138.245:9200,
10.31.190.107:9200, 10.159.19.90:9200: bad datanode 10.31.190.107:9200
12/11/07 02:17:45 WARN wal.HLog: IPC Server handler 35 on 60020 took 65955
ms appending an edit to hlog; editcount=476686, len~=76.0
12/11/07 02:17:45 WARN wal.HLog: HDFS pipeline error detected. Found 2
replicas but expecting no less than 3 replicas.  Requesting close of hlog.

The corresponding DN log goes like this

2012-11-07 02:17:45,142 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder 2 for
Block blk_2813460962462751946_78454): PacketResponder
blk_2813460962462751946_78454 2 Exception java.net.SocketTimeoutException:
66000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.31.138.245:33965remote=/
10.31.190.107:9200]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:806)
        at java.lang.Thread.run(Thread.java:662)

It seems like the DataNode local to the region server is trying to grab the
block from another DN and that is timing out because of this other data
node being bad. All in all this causes response times to be terribly poor.
Is there a way around this or am I missing something ?

Varun