Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow region server recoveries


Copy link to this message
-
Slow region server recoveries
Hi,

We are facing problems with really slow HBase region server recoveries ~ 20
minuted. Version is hbase 0.94.3 compiled with hadoop.profile=2.0.

Hadoop version is CDH 4.2 with HDFS 3703 and HDFS 3912 patched and stale
node timeouts configured correctly. Time for dead node detection is still
10 minutes.

We see that our region server is trying to read an HLog is stuck there for
a long time. Logs here:

2013-04-12 21:14:30,248 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /10.156.194.251:50010 for file
/hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599
for block
BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException:
15000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.156.192.173:52818remote=/
10.156.194.251:50010]

I would think that HDFS 3703 would make the server fail fast and go to the
third datanode. Currently, the recovery seems way too slow for production
usage...

Varun
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB