Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - RS unresponsive after series of deletes


Copy link to this message
-
RE: RS unresponsive after series of deletes
Ted Tuttle 2012-06-20, 20:49
> Like Stack said in his reply, have you thread dumped the slow region
> servers when this happens?

I've been having difficulty reproducing this behavior in controlled
manner. While I haven't been able to get my client to hang up while
doing deletes, I have found a query that when issued after a big delete
takes a very long (>10m) time.

The timeline for this is:

1) insert a bunch of data
2) delete it all in 500 calls to HTable.delete(List<Delete>)
3) scan table for data that was just deleted (500 scans with various row
start/end, where scan bails as soon as first key of first row is found
for a particular row start/end pair)

The last part is very fast on undeleted data.  For my recently deleted
data this takes ~15min.  When I look at RS CPU it is spiking like my
unresponsive episodes in the "wild".

Looking at busy RS I see:

Thread 50 (IPC Server handler 1 on 60020):
  State: RUNNABLE
  Blocked count: 384389
  Waited count: 12192804
  Stack:
 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.peek(StoreFileScan
ner.java:121)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyVal
ueHeap.java:282)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.ja
va:244)
 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.ja
va:521)
 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java
:402)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java
:127)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInter
nal(HRegion.java:3354)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
ion.java:3310)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
ion.java:3327)
 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.ja
va:2393)
    sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
    java.lang.reflect.Method.invoke(Method.java:597)
 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEng
ine.java:364)
 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:137
6)

And lots of block cache churn in the logs:

2012-06-20 13:13:55,572 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 409.4 MB of total=3.4 GB

> It would also help to see the log during
> that time

More of logs here: http://pastebin.com/4annviTS