Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Terribly long HDFS timeouts while appending to HLog


+
Varun Sharma 2012-11-07, 09:43
+
Nicolas Liochon 2012-11-07, 09:56
+
Jeremy Carroll 2012-11-07, 15:22
+
Jeremy Carroll 2012-11-07, 15:25
+
Varun Sharma 2012-11-07, 17:57
Copy link to this message
-
Re: Terribly long HDFS timeouts while appending to HLog
David Charle 2012-11-07, 18:21
u should upgrade to 0.94 as you also had issues with row locks as newer version had improved miniBatchPut code base.
On Nov 7, 2012, at 9:57 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Thanks for the response. One more point is that I am running hadoop 1.0.4
> with hbase 0.92 - not sure if that is known to have these issues.
>
> I had one quick question though - these logs are picked from 10.31.138.145
> and from my understanding of the logs below, its still going to another bad
> datanode for retrieving the block even though it should already have the
> data block - see last line...
>
> 12/11/07 02:17:45 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block blk_2813460962462751946_78454java.io.IOException: Bad
> response 1 for block blk_2813460962462751946_78454 from datanode
> 10.31.190.107:9200
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3084)
>
> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
> blk_2813460962462751946_78454 bad datanode[1] 10.31.190.107:9200
> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
> blk_2813460962462751946_78454 in pipeline *10.31.138.245:9200,
> 10.31.190.107:9200, 10.159.19.90:9200: bad datanode 10.31.190.107:9200*
>
> Looking at the DataNode logs - it seems that the local datanode is trying
> to connect to the remote bad datanode. Is this for replicating the WALEdit ?
>
> 2012-11-07 02:17:45,142 INFO org.apache.hadoop.hdfs.server.datanode.DataNode
> (PacketResponder 2 for Block blk_2813460962462751946_78454):
> PacketResponder blk_2813460962462751946_78454 2 Exception
> java.net.SocketTimeoutException:
> 66000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[*connected local=/**10.31.138.245:33965
> remote=/10.31.190.107:9200]*
> *
> *
> Also, this is preceded by a whole bunch of slow operations with
> processingtimems close to 20 seconds like these - are these other slow
> walEdit appends (slowed down due to HDFS) ?
>
> 12/11/07 02:16:01 WARN ipc.HBaseServer: (responseTooSlow):
> {"processingtimems":21957,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@7198c05d),
> rpc version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.31.128.131:55327
> ","starttimems":1352254539935,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
>
> Thanks
> Varun
>
> On Wed, Nov 7, 2012 at 7:25 AM, Jeremy Carroll <[EMAIL PROTECTED]> wrote:
>
>> Sorry. It's early in the morning here. Did not see the 'read timeout'. +1
>> to Nicolas's response.
>>
>> On Wed, Nov 7, 2012 at 7:22 AM, Jeremy Carroll <[EMAIL PROTECTED]>
>> wrote:
>>
>>> One trick I have used for a while is to
>>> set dfs.datanode.socket.write.timeout in hdfs-site.xml to 0 (disabled).
>>> It's not going to solve your underlying IOPS capacity issue with your
>>> servers, but it can help for short bursty periods. Basically it's hiding
>>> the real issue, but it can help in the short term.
>>>
>>>
>>> On Wed, Nov 7, 2012 at 1:43 AM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am seeing extremely long HDFS timeouts - and this seems to be
>> associated
>>>> with the loss of a DataNode. Here is the RS log:
>>>>
>>>> 12/11/07 02:17:45 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>>>> exception  for block blk_2813460962462751946_78454java.io.IOException:
>> Bad
>>>> response 1 for block blk_2813460962462751946_78454 from datanode
>>>> 10.31.190.107:9200
>>>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3084)
>>>>
>>>> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
>>>> blk_2813460962462751946_78454 bad datanode[1] 10.31.190.107:9200
>>>> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
>>>> blk_2813460962462751946_78454 in pipeline 10.31.138.245:9200,
>>>> 10.31.190.107:9200, 10.159.19.90:9200: bad datanode 10.31.190.107:9200
+
Jeremy Carroll 2012-11-07, 19:52
+
Jeremy Carroll 2012-11-07, 19:53
+
Varun Sharma 2012-11-07, 21:52