Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RS unresponsive after series of deletes


Copy link to this message
-
RS unresponsive after series of deletes
Hi All-

I have a repeatable and troublesome HBase interaction that I would like some advice on.  

I am running a 5 node cluster on v0.94 on cdh3u3 and accessing through Java client API. Each RS has 32G of RAM, is running w/ 16G heap w/ 4G for block cache. Used heap of each RS is well below 16G available.

My client code has a set of deletes to carry out.  After successfully issuing 19 such deletes the client begins logging HBase errors while trying to complete the deletes.  It logs ERRORs every 60s for 10 times and then gives up.

I estimate that the client successfully deleted about 270MB of data in the first 19 deletes.  Each batch delete covering about 144 rows with a row size of about 100KB.  

Here is first of 10 ERRORs logged in client: http://pastebin.com/QMJsbgkZ.  Client errors are 1 per minute between 00:22:48 and 00:32:58 with final error being: http://pastebin.com/ajaVxYUZ

Ultimately, the RS became responsive again. Looking at monitoring I see spike in CPU utilization on node that is unresponsive; it goes from 2% utilization to 20% and sticks there for a few minutes.  None of the other nodes in the cluster appear busy at this time.

Logs from unresponsive RS are here: http://pastebin.com/z9qxGuJS  There are no ERRORs in the log around the time of the unresponsiveness.

It appears from the server log that the "responseTooSlow" operation completed about 7min after the client gave up.  

So, any ideas what was making the RS unresponsive? Did it really take 17min to delete 280MB of data?  

I can easily change client RPC timeouts and number of retries, but I feel there is some I am missing.  Any suggestions?

Thanks,
Ted

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB