Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RS unresponsive after series of deletes


Copy link to this message
-
RE: RS unresponsive after series of deletes
> Like Stack said in his reply, have you thread dumped the slow region
> servers when this happens?

I've been having difficulty reproducing this behavior in controlled
manner. While I haven't been able to get my client to hang up while
doing deletes, I have found a query that when issued after a big delete
takes a very long (>10m) time.

The timeline for this is:

1) insert a bunch of data
2) delete it all in 500 calls to HTable.delete(List<Delete>)
3) scan table for data that was just deleted (500 scans with various row
start/end, where scan bails as soon as first key of first row is found
for a particular row start/end pair)

The last part is very fast on undeleted data.  For my recently deleted
data this takes ~15min.  When I look at RS CPU it is spiking like my
unresponsive episodes in the "wild".

Looking at busy RS I see:

Thread 50 (IPC Server handler 1 on 60020):
  State: RUNNABLE
  Blocked count: 384389
  Waited count: 12192804
  Stack:
 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.peek(StoreFileScan
ner.java:121)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyVal
ueHeap.java:282)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.ja
va:244)
 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.ja
va:521)
 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java
:402)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java
:127)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInter
nal(HRegion.java:3354)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
ion.java:3310)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
ion.java:3327)
 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.ja
va:2393)
    sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
    java.lang.reflect.Method.invoke(Method.java:597)
 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEng
ine.java:364)
 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:137
6)

And lots of block cache churn in the logs:

2012-06-20 13:13:55,572 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 409.4 MB of total=3.4 GB

> It would also help to see the log during
> that time

More of logs here: http://pastebin.com/4annviTS
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB