Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Slow scanning for PrefixFilter on EncodedBlocks


+
J Mohamed Zahoor 2012-10-15, 15:21
+
J Mohamed Zahoor 2012-10-15, 17:27
+
lars hofhansl 2012-10-16, 07:21
+
lars hofhansl 2012-10-16, 21:39
Copy link to this message
-
Re: Slow scanning for PrefixFilter on EncodedBlocks
I reopened HBASE-6577

----- Original Message -----
From: lars hofhansl <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Cc:
Sent: Tuesday, October 16, 2012 2:39 PM
Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks

Looks like this is exactly the scenario I was trying to optimize with HBASE-6577. Hmm...
________________________________
From: lars hofhansl <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Tuesday, October 16, 2012 12:21 AM
Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks

PrefixFilter does not do any seeking by itself, so I doubt this is related to HBASE-6757.
Does this only happen with FAST_DIFF compression?
If you can create an isolated test program (that sets up the scenario and then runs a scan with the filter such that it is very slow), I'm happy to take a look.

-- Lars

----- Original Message -----
From: J Mohamed Zahoor <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc:
Sent: Monday, October 15, 2012 10:27 AM
Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks

Is this related to HBASE-6757 ?
I use a filter list with
  - prefix filter
  - filter list of column filters

/zahoor

On Monday, October 15, 2012, J Mohamed Zahoor wrote:

> Hi
>
> My scanner performance is very slow when using a Prefix filter on a
> **Encoded Column** ( encoded using FAST_DIFF on both memory and disk).
> I am using 94.1 hbase.
>
> jstack shows that much time is spent on seeking the row.
> Even if i give a exact row key match in the prefix filter it takes about
> two minutes to return a single row.
> Running this multiple times also seems to be redirecting things to disk
> (loadBlock).
>
>
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
>  at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
>  at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
>  at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
> - locked <0x000000059584fab8> (a
> org.apache.hadoop.hbase.regionserver.StoreScanner)
>  at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
> - locked <0x000000059584fab8> (a
> org.apache.hadoop.hbase.regionserver.StoreScanner)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
> - locked <0x000000059589bb30> (a
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
>
> If is set the start and end row as same row in scan ... it come in very
> quick.
>
> Saw this link
> http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug
> But it looks like things are fine in 94.1.
>
> Any pointers on why this is slow?
>
>
> Note: the row has not many columns(5 and less than a kb) and lots of
> versions (1500+)
>
> ./zahoor
>
>
>
+
J Mohamed Zahoor 2012-10-17, 08:42
+
J Mohamed Zahoor 2012-10-17, 08:44
+
anil gupta 2012-10-17, 16:41
+
lars hofhansl 2012-10-17, 18:11
+
anil gupta 2012-10-17, 19:25
+
lars hofhansl 2012-10-17, 22:35
+
J Mohamed Zahoor 2012-10-18, 07:45
+
Jerry Lam 2012-10-15, 17:43
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB