Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Slow scanning for PrefixFilter on EncodedBlocks


Copy link to this message
-
Slow scanning for PrefixFilter on EncodedBlocks
J Mohamed Zahoor 2012-10-15, 15:21
Hi

My scanner performance is very slow when using a Prefix filter on a
**Encoded Column** ( encoded using FAST_DIFF on both memory and disk).
I am using 94.1 hbase.

jstack shows that much time is spent on seeking the row.
Even if i give a exact row key match in the prefix filter it takes about
two minutes to return a single row.
Running this multiple times also seems to be redirecting things to disk
(loadBlock).
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
at
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
- locked <0x000000059584fab8> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
- locked <0x000000059584fab8> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
- locked <0x000000059589bb30> (a
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)

If is set the start and end row as same row in scan ... it come in very
quick.

Saw this link
http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug
But it looks like things are fine in 94.1.

Any pointers on why this is slow?
Note: the row has not many columns(5 and less than a kb) and lots of
versions (1500+)

./zahoor