Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Terribly long HDFS timeouts while appending to HLog


+
Varun Sharma 2012-11-07, 09:43
+
Nicolas Liochon 2012-11-07, 09:56
+
Jeremy Carroll 2012-11-07, 15:22
+
Jeremy Carroll 2012-11-07, 15:25
+
Varun Sharma 2012-11-07, 17:57
Copy link to this message
-
Re: Terribly long HDFS timeouts while appending to HLog
u should upgrade to 0.94 as you also had issues with row locks as newer version had improved miniBatchPut code base.
On Nov 7, 2012, at 9:57 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Thanks for the response. One more point is that I am running hadoop 1.0.4
> with hbase 0.92 - not sure if that is known to have these issues.
>
> I had one quick question though - these logs are picked from 10.31.138.145
> and from my understanding of the logs below, its still going to another bad
> datanode for retrieving the block even though it should already have the
> data block - see last line...
>
> 12/11/07 02:17:45 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block blk_2813460962462751946_78454java.io.IOException: Bad
> response 1 for block blk_2813460962462751946_78454 from datanode
> 10.31.190.107:9200
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3084)
>
> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
> blk_2813460962462751946_78454 bad datanode[1] 10.31.190.107:9200
> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
> blk_2813460962462751946_78454 in pipeline *10.31.138.245:9200,
> 10.31.190.107:9200, 10.159.19.90:9200: bad datanode 10.31.190.107:9200*
>
> Looking at the DataNode logs - it seems that the local datanode is trying
> to connect to the remote bad datanode. Is this for replicating the WALEdit ?
>
> 2012-11-07 02:17:45,142 INFO org.apache.hadoop.hdfs.server.datanode.DataNode
> (PacketResponder 2 for Block blk_2813460962462751946_78454):
> PacketResponder blk_2813460962462751946_78454 2 Exception
> java.net.SocketTimeoutException:
> 66000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[*connected local=/**10.31.138.245:33965
> remote=/10.31.190.107:9200]*
> *
> *
> Also, this is preceded by a whole bunch of slow operations with
> processingtimems close to 20 seconds like these - are these other slow
> walEdit appends (slowed down due to HDFS) ?
>
> 12/11/07 02:16:01 WARN ipc.HBaseServer: (responseTooSlow):
> {"processingtimems":21957,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@7198c05d),
> rpc version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.31.128.131:55327
> ","starttimems":1352254539935,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
>
> Thanks
> Varun
>
> On Wed, Nov 7, 2012 at 7:25 AM, Jeremy Carroll <[EMAIL PROTECTED]> wrote:
>
>> Sorry. It's early in the morning here. Did not see the 'read timeout'. +1
>> to Nicolas's response.
>>
>> On Wed, Nov 7, 2012 at 7:22 AM, Jeremy Carroll <[EMAIL PROTECTED]>
>> wrote:
>>
>>> One trick I have used for a while is to
>>> set dfs.datanode.socket.write.timeout in hdfs-site.xml to 0 (disabled).
>>> It's not going to solve your underlying IOPS capacity issue with your
>>> servers, but it can help for short bursty periods. Basically it's hiding
>>> the real issue, but it can help in the short term.
>>>
>>>
>>> On Wed, Nov 7, 2012 at 1:43 AM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am seeing extremely long HDFS timeouts - and this seems to be
>> associated
>>>> with the loss of a DataNode. Here is the RS log:
>>>>
>>>> 12/11/07 02:17:45 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>>>> exception  for block blk_2813460962462751946_78454java.io.IOException:
>> Bad
>>>> response 1 for block blk_2813460962462751946_78454 from datanode
>>>> 10.31.190.107:9200
>>>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3084)
>>>>
>>>> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
>>>> blk_2813460962462751946_78454 bad datanode[1] 10.31.190.107:9200
>>>> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block
>>>> blk_2813460962462751946_78454 in pipeline 10.31.138.245:9200,
>>>> 10.31.190.107:9200, 10.159.19.90:9200: bad datanode 10.31.190.107:9200
+
Jeremy Carroll 2012-11-07, 19:52
+
Jeremy Carroll 2012-11-07, 19:53
+
Varun Sharma 2012-11-07, 21:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB