Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow scanning for PrefixFilter on EncodedBlocks


Copy link to this message
-
Re: Slow scanning for PrefixFilter on EncodedBlocks
Hi ./zahoor:

I don't think it is the same issue.
Did you provide the Scan object with the startkey = prefix?

something like:
Scan scan = new Scan(prefix);

My understanding is that the PrefixFilter does not Seek to the key with
Prefix therefore, the Scanner basically start from the beginning of the
table and apply the Prefix filter to each key values. From this
perspective, the PrefixFilter might be improved by using Hint though..

Best Regards,

Jerry

On Mon, Oct 15, 2012 at 1:27 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> wrote:

> Is this related to HBASE-6757 ?
> I use a filter list with
>   - prefix filter
>   - filter list of column filters
>
> /zahoor
>
> On Monday, October 15, 2012, J Mohamed Zahoor wrote:
>
> > Hi
> >
> > My scanner performance is very slow when using a Prefix filter on a
> > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk).
> > I am using 94.1 hbase.
> >
> > jstack shows that much time is spent on seeking the row.
> > Even if i give a exact row key match in the prefix filter it takes about
> > two minutes to return a single row.
> > Running this multiple times also seems to be redirecting things to disk
> > (loadBlock).
> >
> >
> > at
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
> > at
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
> >  at
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
> > at
> >
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
> > - locked <0x000000059584fab8> (a
> > org.apache.hadoop.hbase.regionserver.StoreScanner)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
> > - locked <0x000000059584fab8> (a
> > org.apache.hadoop.hbase.regionserver.StoreScanner)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
> > - locked <0x000000059589bb30> (a
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
> >
> > If is set the start and end row as same row in scan ... it come in very
> > quick.
> >
> > Saw this link
> >
> http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug
> > But it looks like things are fine in 94.1.
> >
> > Any pointers on why this is slow?
> >
> >
> > Note: the row has not many columns(5 and less than a kb) and lots of
> > versions (1500+)
> >
> > ./zahoor
> >
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB