Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Slow scanning for PrefixFilter on EncodedBlocks


Copy link to this message
-
Re: Slow scanning for PrefixFilter on EncodedBlocks
J Mohamed Zahoor 2012-10-17, 08:42
Sorry for the delay.

It looks like the problem is because of PrefixFilter...
I assumed that i does a seek...

If i use startRow instead.. it works fine.. But is it the correct approach?

./zahoor
On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I reopened HBASE-6577
>
>
>
> ----- Original Message -----
> From: lars hofhansl <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> Cc:
> Sent: Tuesday, October 16, 2012 2:39 PM
> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
>
> Looks like this is exactly the scenario I was trying to optimize with
> HBASE-6577. Hmm...
> ________________________________
> From: lars hofhansl <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Tuesday, October 16, 2012 12:21 AM
> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
>
> PrefixFilter does not do any seeking by itself, so I doubt this is related
> to HBASE-6757.
> Does this only happen with FAST_DIFF compression?
>
>
> If you can create an isolated test program (that sets up the scenario and
> then runs a scan with the filter such that it is very slow), I'm happy to
> take a look.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: J Mohamed Zahoor <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc:
> Sent: Monday, October 15, 2012 10:27 AM
> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
>
> Is this related to HBASE-6757 ?
> I use a filter list with
>   - prefix filter
>   - filter list of column filters
>
> /zahoor
>
> On Monday, October 15, 2012, J Mohamed Zahoor wrote:
>
> > Hi
> >
> > My scanner performance is very slow when using a Prefix filter on a
> > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk).
> > I am using 94.1 hbase.
> >
> > jstack shows that much time is spent on seeking the row.
> > Even if i give a exact row key match in the prefix filter it takes about
> > two minutes to return a single row.
> > Running this multiple times also seems to be redirecting things to disk
> > (loadBlock).
> >
> >
> > at
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
> > at
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
> >  at
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
> > at
> >
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
> > - locked <0x000000059584fab8> (a
> > org.apache.hadoop.hbase.regionserver.StoreScanner)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
> > - locked <0x000000059584fab8> (a
> > org.apache.hadoop.hbase.regionserver.StoreScanner)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
> > - locked <0x000000059589bb30> (a
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
> >
> > If is set the start and end row as same row in scan ... it come in very
> > quick.
> >
> > Saw this link
> >
> http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug
> > But it looks like things are fine in 94.1.
> >
> > Any pointers on why this is slow?