Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - RS unresponsive after series of deletes


+
Ted Tuttle 2012-06-13, 19:09
+
Stack 2012-06-14, 17:38
+
Ted Tuttle 2012-06-14, 19:36
+
Ted Tuttle 2012-06-18, 22:08
+
Jean-Daniel Cryans 2012-06-18, 22:17
Copy link to this message
-
RE: RS unresponsive after series of deletes
Ted Tuttle 2012-06-20, 20:49
> Like Stack said in his reply, have you thread dumped the slow region
> servers when this happens?

I've been having difficulty reproducing this behavior in controlled
manner. While I haven't been able to get my client to hang up while
doing deletes, I have found a query that when issued after a big delete
takes a very long (>10m) time.

The timeline for this is:

1) insert a bunch of data
2) delete it all in 500 calls to HTable.delete(List<Delete>)
3) scan table for data that was just deleted (500 scans with various row
start/end, where scan bails as soon as first key of first row is found
for a particular row start/end pair)

The last part is very fast on undeleted data.  For my recently deleted
data this takes ~15min.  When I look at RS CPU it is spiking like my
unresponsive episodes in the "wild".

Looking at busy RS I see:

Thread 50 (IPC Server handler 1 on 60020):
  State: RUNNABLE
  Blocked count: 384389
  Waited count: 12192804
  Stack:
 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.peek(StoreFileScan
ner.java:121)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyVal
ueHeap.java:282)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.ja
va:244)
 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.ja
va:521)
 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java
:402)
 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java
:127)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInter
nal(HRegion.java:3354)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
ion.java:3310)
 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg
ion.java:3327)
 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.ja
va:2393)
    sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
    java.lang.reflect.Method.invoke(Method.java:597)
 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEng
ine.java:364)
 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:137
6)

And lots of block cache churn in the logs:

2012-06-20 13:13:55,572 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 409.4 MB of total=3.4 GB

> It would also help to see the log during
> that time

More of logs here: http://pastebin.com/4annviTS
+
Jean-Daniel Cryans 2012-06-20, 21:32
+
Ted Tuttle 2012-06-21, 00:07
+
Ted Yu 2012-06-21, 03:52
+
Ted Yu 2012-06-21, 04:33
+
Ted Tuttle 2012-06-21, 04:54
+
Ted Yu 2012-06-21, 05:13
+
Ted Tuttle 2012-06-21, 14:02
+
Ted Yu 2012-06-21, 17:32
+
Ted Tuttle 2012-06-21, 18:00
+
Ted Yu 2012-06-21, 18:10
+
Ted Yu 2012-06-21, 14:19
+
Ted Tuttle 2012-06-21, 18:16
+
lars hofhansl 2012-06-23, 02:35