|
J Mohamed Zahoor
2012-10-15, 15:21
J Mohamed Zahoor
2012-10-15, 17:27
lars hofhansl
2012-10-16, 07:21
lars hofhansl
2012-10-16, 21:39
lars hofhansl
2012-10-16, 22:08
J Mohamed Zahoor
2012-10-17, 08:42
J Mohamed Zahoor
2012-10-17, 08:44
anil gupta
2012-10-17, 16:41
lars hofhansl
2012-10-17, 18:11
anil gupta
2012-10-17, 19:25
lars hofhansl
2012-10-17, 22:35
J Mohamed Zahoor
2012-10-18, 07:45
Jerry Lam
2012-10-15, 17:43
|
-
Slow scanning for PrefixFilter on EncodedBlocksJ Mohamed Zahoor 2012-10-15, 15:21
Hi
My scanner performance is very slow when using a Prefix filter on a **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). I am using 94.1 hbase. jstack shows that much time is spent on seeking the row. Even if i give a exact row key match in the prefix filter it takes about two minutes to return a single row. Running this multiple times also seems to be redirecting things to disk (loadBlock). at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) - locked <0x000000059584fab8> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) - locked <0x000000059584fab8> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) - locked <0x000000059589bb30> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) If is set the start and end row as same row in scan ... it come in very quick. Saw this link http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug But it looks like things are fine in 94.1. Any pointers on why this is slow? Note: the row has not many columns(5 and less than a kb) and lots of versions (1500+) ./zahoor +
J Mohamed Zahoor 2012-10-15, 15:21
-
Re: Slow scanning for PrefixFilter on EncodedBlocksJ Mohamed Zahoor 2012-10-15, 17:27
Is this related to HBASE-6757 ?
I use a filter list with - prefix filter - filter list of column filters /zahoor On Monday, October 15, 2012, J Mohamed Zahoor wrote: > Hi > > My scanner performance is very slow when using a Prefix filter on a > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). > I am using 94.1 hbase. > > jstack shows that much time is spent on seeking the row. > Even if i give a exact row key match in the prefix filter it takes about > two minutes to return a single row. > Running this multiple times also seems to be redirecting things to disk > (loadBlock). > > > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > at > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) > - locked <0x000000059584fab8> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) > - locked <0x000000059584fab8> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) > - locked <0x000000059589bb30> (a > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) > > If is set the start and end row as same row in scan ... it come in very > quick. > > Saw this link > http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug > But it looks like things are fine in 94.1. > > Any pointers on why this is slow? > > > Note: the row has not many columns(5 and less than a kb) and lots of > versions (1500+) > > ./zahoor > > > +
J Mohamed Zahoor 2012-10-15, 17:27
-
Re: Slow scanning for PrefixFilter on EncodedBlockslars hofhansl 2012-10-16, 07:21
PrefixFilter does not do any seeking by itself, so I doubt this is related to HBASE-6757.
Does this only happen with FAST_DIFF compression? If you can create an isolated test program (that sets up the scenario and then runs a scan with the filter such that it is very slow), I'm happy to take a look. -- Lars ----- Original Message ----- From: J Mohamed Zahoor <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Cc: Sent: Monday, October 15, 2012 10:27 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Is this related to HBASE-6757 ? I use a filter list with - prefix filter - filter list of column filters /zahoor On Monday, October 15, 2012, J Mohamed Zahoor wrote: > Hi > > My scanner performance is very slow when using a Prefix filter on a > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). > I am using 94.1 hbase. > > jstack shows that much time is spent on seeking the row. > Even if i give a exact row key match in the prefix filter it takes about > two minutes to return a single row. > Running this multiple times also seems to be redirecting things to disk > (loadBlock). > > > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > at > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) > - locked <0x000000059584fab8> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) > - locked <0x000000059584fab8> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) > - locked <0x000000059589bb30> (a > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) > > If is set the start and end row as same row in scan ... it come in very > quick. > > Saw this link > http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug > But it looks like things are fine in 94.1. > > Any pointers on why this is slow? > > > Note: the row has not many columns(5 and less than a kb) and lots of > versions (1500+) > > ./zahoor > > > +
lars hofhansl 2012-10-16, 07:21
-
Re: Slow scanning for PrefixFilter on EncodedBlockslars hofhansl 2012-10-16, 21:39
Looks like this is exactly the scenario I was trying to optimize with HBASE-6577. Hmm...
________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Tuesday, October 16, 2012 12:21 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks PrefixFilter does not do any seeking by itself, so I doubt this is related to HBASE-6757. Does this only happen with FAST_DIFF compression? If you can create an isolated test program (that sets up the scenario and then runs a scan with the filter such that it is very slow), I'm happy to take a look. -- Lars ----- Original Message ----- From: J Mohamed Zahoor <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Cc: Sent: Monday, October 15, 2012 10:27 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Is this related to HBASE-6757 ? I use a filter list with - prefix filter - filter list of column filters /zahoor On Monday, October 15, 2012, J Mohamed Zahoor wrote: > Hi > > My scanner performance is very slow when using a Prefix filter on a > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). > I am using 94.1 hbase. > > jstack shows that much time is spent on seeking the row. > Even if i give a exact row key match in the prefix filter it takes about > two minutes to return a single row. > Running this multiple times also seems to be redirecting things to disk > (loadBlock). > > > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > at > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) > - locked <0x000000059584fab8> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) > - locked <0x000000059584fab8> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) > - locked <0x000000059589bb30> (a > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) > > If is set the start and end row as same row in scan ... it come in very > quick. > > Saw this link > http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug > But it looks like things are fine in 94.1. > > Any pointers on why this is slow? > > > Note: the row has not many columns(5 and less than a kb) and lots of > versions (1500+) > > ./zahoor > > > +
lars hofhansl 2012-10-16, 21:39
-
Re: Slow scanning for PrefixFilter on EncodedBlockslars hofhansl 2012-10-16, 22:08
I reopened HBASE-6577
----- Original Message ----- From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Cc: Sent: Tuesday, October 16, 2012 2:39 PM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Looks like this is exactly the scenario I was trying to optimize with HBASE-6577. Hmm... ________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Tuesday, October 16, 2012 12:21 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks PrefixFilter does not do any seeking by itself, so I doubt this is related to HBASE-6757. Does this only happen with FAST_DIFF compression? If you can create an isolated test program (that sets up the scenario and then runs a scan with the filter such that it is very slow), I'm happy to take a look. -- Lars ----- Original Message ----- From: J Mohamed Zahoor <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Cc: Sent: Monday, October 15, 2012 10:27 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Is this related to HBASE-6757 ? I use a filter list with - prefix filter - filter list of column filters /zahoor On Monday, October 15, 2012, J Mohamed Zahoor wrote: > Hi > > My scanner performance is very slow when using a Prefix filter on a > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). > I am using 94.1 hbase. > > jstack shows that much time is spent on seeking the row. > Even if i give a exact row key match in the prefix filter it takes about > two minutes to return a single row. > Running this multiple times also seems to be redirecting things to disk > (loadBlock). > > > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > at > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) > - locked <0x000000059584fab8> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) > - locked <0x000000059584fab8> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) > - locked <0x000000059589bb30> (a > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) > > If is set the start and end row as same row in scan ... it come in very > quick. > > Saw this link > http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug > But it looks like things are fine in 94.1. > > Any pointers on why this is slow? > > > Note: the row has not many columns(5 and less than a kb) and lots of > versions (1500+) > > ./zahoor > > > +
lars hofhansl 2012-10-16, 22:08
-
Re: Slow scanning for PrefixFilter on EncodedBlocksJ Mohamed Zahoor 2012-10-17, 08:42
Sorry for the delay.
It looks like the problem is because of PrefixFilter... I assumed that i does a seek... If i use startRow instead.. it works fine.. But is it the correct approach? ./zahoor On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > I reopened HBASE-6577 > > > > ----- Original Message ----- > From: lars hofhansl <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl < > [EMAIL PROTECTED]> > Cc: > Sent: Tuesday, October 16, 2012 2:39 PM > Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > Looks like this is exactly the scenario I was trying to optimize with > HBASE-6577. Hmm... > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Tuesday, October 16, 2012 12:21 AM > Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > PrefixFilter does not do any seeking by itself, so I doubt this is related > to HBASE-6757. > Does this only happen with FAST_DIFF compression? > > > If you can create an isolated test program (that sets up the scenario and > then runs a scan with the filter such that it is very slow), I'm happy to > take a look. > > -- Lars > > > > ----- Original Message ----- > From: J Mohamed Zahoor <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: > Sent: Monday, October 15, 2012 10:27 AM > Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > Is this related to HBASE-6757 ? > I use a filter list with > - prefix filter > - filter list of column filters > > /zahoor > > On Monday, October 15, 2012, J Mohamed Zahoor wrote: > > > Hi > > > > My scanner performance is very slow when using a Prefix filter on a > > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). > > I am using 94.1 hbase. > > > > jstack shows that much time is spent on seeking the row. > > Even if i give a exact row key match in the prefix filter it takes about > > two minutes to return a single row. > > Running this multiple times also seems to be redirecting things to disk > > (loadBlock). > > > > > > at > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > > at > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > > at > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > > at > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > > at > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > > at > > > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > > at > > > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) > > - locked <0x000000059584fab8> (a > > org.apache.hadoop.hbase.regionserver.StoreScanner) > > at > > > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) > > - locked <0x000000059584fab8> (a > > org.apache.hadoop.hbase.regionserver.StoreScanner) > > at > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) > > at > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) > > at > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) > > - locked <0x000000059589bb30> (a > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > > at > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) > > > > If is set the start and end row as same row in scan ... it come in very > > quick. > > > > Saw this link > > > http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug > > But it looks like things are fine in 94.1. > > > > Any pointers on why this is slow? +
J Mohamed Zahoor 2012-10-17, 08:42
-
Re: Slow scanning for PrefixFilter on EncodedBlocksJ Mohamed Zahoor 2012-10-17, 08:44
First i upgraded my cluster to 94.2.. even then the problem persisted..
Then i moved to using startRow instead of prefix filter.. ,/zahoor On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> wrote: > Sorry for the delay. > > It looks like the problem is because of PrefixFilter... > I assumed that i does a seek... > > If i use startRow instead.. it works fine.. But is it the correct approach? > > ./zahoor > > > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED]>wrote: > >> I reopened HBASE-6577 >> >> >> >> ----- Original Message ----- >> From: lars hofhansl <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl < >> [EMAIL PROTECTED]> >> Cc: >> Sent: Tuesday, October 16, 2012 2:39 PM >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks >> >> Looks like this is exactly the scenario I was trying to optimize with >> HBASE-6577. Hmm... >> ________________________________ >> From: lars hofhansl <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> Sent: Tuesday, October 16, 2012 12:21 AM >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks >> >> PrefixFilter does not do any seeking by itself, so I doubt this is >> related to HBASE-6757. >> Does this only happen with FAST_DIFF compression? >> >> >> If you can create an isolated test program (that sets up the scenario and >> then runs a scan with the filter such that it is very slow), I'm happy to >> take a look. >> >> -- Lars >> >> >> >> ----- Original Message ----- >> From: J Mohamed Zahoor <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> Cc: >> Sent: Monday, October 15, 2012 10:27 AM >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks >> >> Is this related to HBASE-6757 ? >> I use a filter list with >> - prefix filter >> - filter list of column filters >> >> /zahoor >> >> On Monday, October 15, 2012, J Mohamed Zahoor wrote: >> >> > Hi >> > >> > My scanner performance is very slow when using a Prefix filter on a >> > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). >> > I am using 94.1 hbase. >> > >> > jstack shows that much time is spent on seeking the row. >> > Even if i give a exact row key match in the prefix filter it takes about >> > two minutes to return a single row. >> > Running this multiple times also seems to be redirecting things to disk >> > (loadBlock). >> > >> > >> > at >> > >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) >> > at >> > >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) >> > at >> > >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) >> > at >> > >> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) >> > - locked <0x000000059584fab8> (a >> > org.apache.hadoop.hbase.regionserver.StoreScanner) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) >> > - locked <0x000000059584fab8> (a >> > org.apache.hadoop.hbase.regionserver.StoreScanner) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) >> > - locked <0x000000059589bb30> (a >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) +
J Mohamed Zahoor 2012-10-17, 08:44
-
Re: Slow scanning for PrefixFilter on EncodedBlocksanil gupta 2012-10-17, 16:41
Hi Zahoor,
I heavily use prefix filter. Every time i have to explicitly define the startRow. So, that's the current behavior. However, initially this behavior was confusing to me also. I think that when a Prefix filter is defined then internally the startRow=prefix can be set. User defined StartRow takes precedence over the prefixFilter startRow. If the current prefixFilter can be modified in that way then it will eradicate this confusion regarding performance of prefix filter. Thanks, Anil Gupta On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[EMAIL PROTECTED]> wrote: > First i upgraded my cluster to 94.2.. even then the problem persisted.. > Then i moved to using startRow instead of prefix filter.. > > > ,/zahoor > > On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> > wrote: > > > Sorry for the delay. > > > > It looks like the problem is because of PrefixFilter... > > I assumed that i does a seek... > > > > If i use startRow instead.. it works fine.. But is it the correct > approach? > > > > ./zahoor > > > > > > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED] > >wrote: > > > >> I reopened HBASE-6577 > >> > >> > >> > >> ----- Original Message ----- > >> From: lars hofhansl <[EMAIL PROTECTED]> > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl < > >> [EMAIL PROTECTED]> > >> Cc: > >> Sent: Tuesday, October 16, 2012 2:39 PM > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> > >> Looks like this is exactly the scenario I was trying to optimize with > >> HBASE-6577. Hmm... > >> ________________________________ > >> From: lars hofhansl <[EMAIL PROTECTED]> > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > >> Sent: Tuesday, October 16, 2012 12:21 AM > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> > >> PrefixFilter does not do any seeking by itself, so I doubt this is > >> related to HBASE-6757. > >> Does this only happen with FAST_DIFF compression? > >> > >> > >> If you can create an isolated test program (that sets up the scenario > and > >> then runs a scan with the filter such that it is very slow), I'm happy > to > >> take a look. > >> > >> -- Lars > >> > >> > >> > >> ----- Original Message ----- > >> From: J Mohamed Zahoor <[EMAIL PROTECTED]> > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > >> Cc: > >> Sent: Monday, October 15, 2012 10:27 AM > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> > >> Is this related to HBASE-6757 ? > >> I use a filter list with > >> - prefix filter > >> - filter list of column filters > >> > >> /zahoor > >> > >> On Monday, October 15, 2012, J Mohamed Zahoor wrote: > >> > >> > Hi > >> > > >> > My scanner performance is very slow when using a Prefix filter on a > >> > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). > >> > I am using 94.1 hbase. > >> > > >> > jstack shows that much time is spent on seeking the row. > >> > Even if i give a exact row key match in the prefix filter it takes > about > >> > two minutes to return a single row. > >> > Running this multiple times also seems to be redirecting things to > disk > >> > (loadBlock). > >> > > >> > > >> > at > >> > > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > >> > at > >> > > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > >> > at > >> > > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > >> > at > >> > > >> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > >> > at > >> > > >> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > >> > at > >> > > >> > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > >> > at > >> > > >> > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) Thanks & Regards, Anil Gupta +
anil gupta 2012-10-17, 16:41
-
Re: Slow scanning for PrefixFilter on EncodedBlockslars hofhansl 2012-10-17, 18:11
That is a good point. There is no reason why prefix filter cannot issue a seek to the first KV for that prefix.
Although it lead to a practice where people would the prefix filter when they in fact should just set the start row. ----- Original Message ----- From: anil gupta <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Wednesday, October 17, 2012 9:41 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Hi Zahoor, I heavily use prefix filter. Every time i have to explicitly define the startRow. So, that's the current behavior. However, initially this behavior was confusing to me also. I think that when a Prefix filter is defined then internally the startRow=prefix can be set. User defined StartRow takes precedence over the prefixFilter startRow. If the current prefixFilter can be modified in that way then it will eradicate this confusion regarding performance of prefix filter. Thanks, Anil Gupta On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[EMAIL PROTECTED]> wrote: > First i upgraded my cluster to 94.2.. even then the problem persisted.. > Then i moved to using startRow instead of prefix filter.. > > > ,/zahoor > > On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> > wrote: > > > Sorry for the delay. > > > > It looks like the problem is because of PrefixFilter... > > I assumed that i does a seek... > > > > If i use startRow instead.. it works fine.. But is it the correct > approach? > > > > ./zahoor > > > > > > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED] > >wrote: > > > >> I reopened HBASE-6577 > >> > >> > >> > >> ----- Original Message ----- > >> From: lars hofhansl <[EMAIL PROTECTED]> > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl < > >> [EMAIL PROTECTED]> > >> Cc: > >> Sent: Tuesday, October 16, 2012 2:39 PM > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> > >> Looks like this is exactly the scenario I was trying to optimize with > >> HBASE-6577. Hmm... > >> ________________________________ > >> From: lars hofhansl <[EMAIL PROTECTED]> > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > >> Sent: Tuesday, October 16, 2012 12:21 AM > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> > >> PrefixFilter does not do any seeking by itself, so I doubt this is > >> related to HBASE-6757. > >> Does this only happen with FAST_DIFF compression? > >> > >> > >> If you can create an isolated test program (that sets up the scenario > and > >> then runs a scan with the filter such that it is very slow), I'm happy > to > >> take a look. > >> > >> -- Lars > >> > >> > >> > >> ----- Original Message ----- > >> From: J Mohamed Zahoor <[EMAIL PROTECTED]> > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > >> Cc: > >> Sent: Monday, October 15, 2012 10:27 AM > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> > >> Is this related to HBASE-6757 ? > >> I use a filter list with > >> - prefix filter > >> - filter list of column filters > >> > >> /zahoor > >> > >> On Monday, October 15, 2012, J Mohamed Zahoor wrote: > >> > >> > Hi > >> > > >> > My scanner performance is very slow when using a Prefix filter on a > >> > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). > >> > I am using 94.1 hbase. > >> > > >> > jstack shows that much time is spent on seeking the row. > >> > Even if i give a exact row key match in the prefix filter it takes > about > >> > two minutes to return a single row. > >> > Running this multiple times also seems to be redirecting things to > disk > >> > (loadBlock). > >> > > >> > > >> > at > >> > > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > >> > at > >> > > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > >> > at > >> > > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) Thanks & Regards, Anil Gupta +
lars hofhansl 2012-10-17, 18:11
-
Re: Slow scanning for PrefixFilter on EncodedBlocksanil gupta 2012-10-17, 19:25
Hi Lars,
There is a specific use case for this: Table: Suppose i have a rowkey:<customer_id><event_timestamp><uid> Use case: I would like to get all the events of customer_id=123. Case 1: If i only use startRow=123 then i will get events of other customers having customers_id > 123 since the scanner will be keep on fetching rows until the end of table. Case 2: If i use prefixFilter=123 and startRow=123 then i will get the correct result. IMHO, adding the feature of smartly adding the startRow in PrefixFilter wont hurt any existing functionality. Use of StartRow and PrefixFilter will still be different. Thanks, Anil Gupta On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > That is a good point. There is no reason why prefix filter cannot issue a > seek to the first KV for that prefix. > Although it lead to a practice where people would the prefix filter when > they in fact should just set the start row. > > > > > ----- Original Message ----- > From: anil gupta <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Wednesday, October 17, 2012 9:41 AM > Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > Hi Zahoor, > > I heavily use prefix filter. Every time i have to explicitly define the > startRow. So, that's the current behavior. However, initially this behavior > was confusing to me also. > I think that when a Prefix filter is defined then internally the > startRow=prefix can be set. User defined StartRow takes precedence over the > prefixFilter startRow. If the current prefixFilter can be modified in that > way then it will eradicate this confusion regarding performance of prefix > filter. > > Thanks, > Anil Gupta > > On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[EMAIL PROTECTED]> > wrote: > > > First i upgraded my cluster to 94.2.. even then the problem persisted.. > > Then i moved to using startRow instead of prefix filter.. > > > > > > ,/zahoor > > > > On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> > > wrote: > > > > > Sorry for the delay. > > > > > > It looks like the problem is because of PrefixFilter... > > > I assumed that i does a seek... > > > > > > If i use startRow instead.. it works fine.. But is it the correct > > approach? > > > > > > ./zahoor > > > > > > > > > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED] > > >wrote: > > > > > >> I reopened HBASE-6577 > > >> > > >> > > >> > > >> ----- Original Message ----- > > >> From: lars hofhansl <[EMAIL PROTECTED]> > > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl < > > >> [EMAIL PROTECTED]> > > >> Cc: > > >> Sent: Tuesday, October 16, 2012 2:39 PM > > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > >> > > >> Looks like this is exactly the scenario I was trying to optimize with > > >> HBASE-6577. Hmm... > > >> ________________________________ > > >> From: lars hofhansl <[EMAIL PROTECTED]> > > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > >> Sent: Tuesday, October 16, 2012 12:21 AM > > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > >> > > >> PrefixFilter does not do any seeking by itself, so I doubt this is > > >> related to HBASE-6757. > > >> Does this only happen with FAST_DIFF compression? > > >> > > >> > > >> If you can create an isolated test program (that sets up the scenario > > and > > >> then runs a scan with the filter such that it is very slow), I'm happy > > to > > >> take a look. > > >> > > >> -- Lars > > >> > > >> > > >> > > >> ----- Original Message ----- > > >> From: J Mohamed Zahoor <[EMAIL PROTECTED]> > > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > >> Cc: > > >> Sent: Monday, October 15, 2012 10:27 AM > > >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > >> > > >> Is this related to HBASE-6757 ? > > >> I use a filter list with > > >> - prefix filter > > >> - filter list of column filters > > >> > > >> /zahoor > > >> > > >> On Monday, October 15, 2012, J Mohamed Zahoor wrote: Thanks & Regards, Anil Gupta +
anil gupta 2012-10-17, 19:25
-
Re: Slow scanning for PrefixFilter on EncodedBlockslars hofhansl 2012-10-17, 22:35
Oh yeah, I meant that one should always set the startrow as a matter of practice - if possible - and never rely on the filter alone.
________________________________ From: anil gupta <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> Sent: Wednesday, October 17, 2012 12:25 PM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Hi Lars, There is a specific use case for this: Table: Suppose i have a rowkey:<customer_id><event_timestamp><uid> Use case: I would like to get all the events of customer_id=123. Case 1: If i only use startRow=123 then i will get events of other customers having customers_id > 123 since the scanner will be keep on fetching rows until the end of table. Case 2: If i use prefixFilter=123 and startRow=123 then i will get the correct result. IMHO, adding the feature of smartly adding the startRow in PrefixFilter wont hurt any existing functionality. Use of StartRow and PrefixFilter will still be different. Thanks, Anil Gupta On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: That is a good point. There is no reason why prefix filter cannot issue a seek to the first KV for that prefix. >Although it lead to a practice where people would the prefix filter when they in fact should just set the start row. > > > > > >----- Original Message ----- >From: anil gupta <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Cc: >Sent: Wednesday, October 17, 2012 9:41 AM >Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >Hi Zahoor, > >I heavily use prefix filter. Every time i have to explicitly define the >startRow. So, that's the current behavior. However, initially this behavior >was confusing to me also. >I think that when a Prefix filter is defined then internally the >startRow=prefix can be set. User defined StartRow takes precedence over the >prefixFilter startRow. If the current prefixFilter can be modified in that >way then it will eradicate this confusion regarding performance of prefix >filter. > >Thanks, >Anil Gupta > >On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[EMAIL PROTECTED]> wrote: > >> First i upgraded my cluster to 94.2.. even then the problem persisted.. >> Then i moved to using startRow instead of prefix filter.. >> >> >> ,/zahoor >> >> On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> >> wrote: >> >> > Sorry for the delay. >> > >> > It looks like the problem is because of PrefixFilter... >> > I assumed that i does a seek... >> > >> > If i use startRow instead.. it works fine.. But is it the correct >> approach? >> > >> > ./zahoor >> > >> > >> > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED] >> >wrote: >> > >> >> I reopened HBASE-6577 >> >> >> >> >> >> >> >> ----- Original Message ----- >> >> From: lars hofhansl <[EMAIL PROTECTED]> >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl < >> >> [EMAIL PROTECTED]> >> >> Cc: >> >> Sent: Tuesday, October 16, 2012 2:39 PM >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks >> >> >> >> Looks like this is exactly the scenario I was trying to optimize with >> >> HBASE-6577. Hmm... >> >> ________________________________ >> >> From: lars hofhansl <[EMAIL PROTECTED]> >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> >> Sent: Tuesday, October 16, 2012 12:21 AM >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks >> >> >> >> PrefixFilter does not do any seeking by itself, so I doubt this is >> >> related to HBASE-6757. >> >> Does this only happen with FAST_DIFF compression? >> >> >> >> >> >> If you can create an isolated test program (that sets up the scenario >> and >> >> then runs a scan with the filter such that it is very slow), I'm happy >> to >> >> take a look. >> >> >> >> -- Lars >> >> >> >> >> >> >> >> ----- Original Message ----- >> >> From: J Mohamed Zahoor <[EMAIL PROTECTED]> >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> >> Cc: >> > Thanks & Regards, Anil Gupta +
lars hofhansl 2012-10-17, 22:35
-
Re: Slow scanning for PrefixFilter on EncodedBlocksJ Mohamed Zahoor 2012-10-18, 07:45
+1 for making PrefixFIlter seek instead of using a startRow explicitly.
./zahoor On Thu, Oct 18, 2012 at 4:05 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Oh yeah, I meant that one should always set the startrow as a matter of > practice - if possible - and never rely on the filter alone. > > > > ________________________________ > From: anil gupta <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> > Sent: Wednesday, October 17, 2012 12:25 PM > Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > > Hi Lars, > > There is a specific use case for this: > > Table: Suppose i have a rowkey:<customer_id><event_timestamp><uid> > > Use case: I would like to get all the events of customer_id=123. > Case 1: If i only use startRow=123 then i will get events of other > customers having customers_id > 123 since the scanner will be keep on > fetching rows until the end of table. > Case 2: If i use prefixFilter=123 and startRow=123 then i will get the > correct result. > > IMHO, adding the feature of smartly adding the startRow in PrefixFilter > wont hurt any existing functionality. Use of StartRow and PrefixFilter will > still be different. > > Thanks, > Anil Gupta > > > > On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > That is a good point. There is no reason why prefix filter cannot issue a > seek to the first KV for that prefix. > >Although it lead to a practice where people would the prefix filter when > they in fact should just set the start row. > > > > > > > > > > > >----- Original Message ----- > >From: anil gupta <[EMAIL PROTECTED]> > >To: [EMAIL PROTECTED] > >Cc: > >Sent: Wednesday, October 17, 2012 9:41 AM > >Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > > >Hi Zahoor, > > > >I heavily use prefix filter. Every time i have to explicitly define the > >startRow. So, that's the current behavior. However, initially this > behavior > >was confusing to me also. > >I think that when a Prefix filter is defined then internally the > >startRow=prefix can be set. User defined StartRow takes precedence over > the > >prefixFilter startRow. If the current prefixFilter can be modified in that > >way then it will eradicate this confusion regarding performance of prefix > >filter. > > > >Thanks, > >Anil Gupta > > > >On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[EMAIL PROTECTED]> > wrote: > > > >> First i upgraded my cluster to 94.2.. even then the problem persisted.. > >> Then i moved to using startRow instead of prefix filter.. > >> > >> > >> ,/zahoor > >> > >> On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> > >> wrote: > >> > >> > Sorry for the delay. > >> > > >> > It looks like the problem is because of PrefixFilter... > >> > I assumed that i does a seek... > >> > > >> > If i use startRow instead.. it works fine.. But is it the correct > >> approach? > >> > > >> > ./zahoor > >> > > >> > > >> > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED] > >> >wrote: > >> > > >> >> I reopened HBASE-6577 > >> >> > >> >> > >> >> > >> >> ----- Original Message ----- > >> >> From: lars hofhansl <[EMAIL PROTECTED]> > >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl < > >> >> [EMAIL PROTECTED]> > >> >> Cc: > >> >> Sent: Tuesday, October 16, 2012 2:39 PM > >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> >> > >> >> Looks like this is exactly the scenario I was trying to optimize with > >> >> HBASE-6577. Hmm... > >> >> ________________________________ > >> >> From: lars hofhansl <[EMAIL PROTECTED]> > >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > >> >> Sent: Tuesday, October 16, 2012 12:21 AM > >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> >> > >> >> PrefixFilter does not do any seeking by itself, so I doubt this is > >> >> related to HBASE-6757. > >> >> Does this only happen with FAST_DIFF compression? > >> >> > >> >> +
J Mohamed Zahoor 2012-10-18, 07:45
-
Re: Slow scanning for PrefixFilter on EncodedBlocksJerry Lam 2012-10-15, 17:43
Hi ./zahoor:
I don't think it is the same issue. Did you provide the Scan object with the startkey = prefix? something like: Scan scan = new Scan(prefix); My understanding is that the PrefixFilter does not Seek to the key with Prefix therefore, the Scanner basically start from the beginning of the table and apply the Prefix filter to each key values. From this perspective, the PrefixFilter might be improved by using Hint though.. Best Regards, Jerry On Mon, Oct 15, 2012 at 1:27 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> wrote: > Is this related to HBASE-6757 ? > I use a filter list with > - prefix filter > - filter list of column filters > > /zahoor > > On Monday, October 15, 2012, J Mohamed Zahoor wrote: > > > Hi > > > > My scanner performance is very slow when using a Prefix filter on a > > **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). > > I am using 94.1 hbase. > > > > jstack shows that much time is spent on seeking the row. > > Even if i give a exact row key match in the prefix filter it takes about > > two minutes to return a single row. > > Running this multiple times also seems to be redirecting things to disk > > (loadBlock). > > > > > > at > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > > at > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > > at > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > > at > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > > at > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > > at > > > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > > at > > > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) > > - locked <0x000000059584fab8> (a > > org.apache.hadoop.hbase.regionserver.StoreScanner) > > at > > > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) > > - locked <0x000000059584fab8> (a > > org.apache.hadoop.hbase.regionserver.StoreScanner) > > at > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) > > at > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) > > at > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) > > - locked <0x000000059589bb30> (a > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > > at > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) > > > > If is set the start and end row as same row in scan ... it come in very > > quick. > > > > Saw this link > > > http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug > > But it looks like things are fine in 94.1. > > > > Any pointers on why this is slow? > > > > > > Note: the row has not many columns(5 and less than a kb) and lots of > > versions (1500+) > > > > ./zahoor > > > > > > > +
Jerry Lam 2012-10-15, 17:43
|