Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Slow scanning for PrefixFilter on EncodedBlocks


Copy link to this message
-
Re: Slow scanning for PrefixFilter on EncodedBlocks
J Mohamed Zahoor 2012-10-18, 07:45
+1 for making PrefixFIlter seek instead of using a startRow explicitly.

./zahoor

On Thu, Oct 18, 2012 at 4:05 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Oh yeah, I meant that one should always set the startrow as a matter of
> practice - if possible - and never rely on the filter alone.
>
>
>
> ________________________________
>  From: anil gupta <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Wednesday, October 17, 2012 12:25 PM
> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
>
>
> Hi Lars,
>
> There is a specific use case for this:
>
> Table: Suppose i have a rowkey:<customer_id><event_timestamp><uid>
>
> Use case: I would like to get all the events of customer_id=123.
> Case 1: If i only use startRow=123 then i will get events of  other
> customers having customers_id > 123 since the scanner will be keep on
> fetching rows until the end of table.
> Case 2: If i use prefixFilter=123 and startRow=123 then i will get the
> correct result.
>
> IMHO, adding the feature of smartly adding the startRow in PrefixFilter
> wont hurt any existing functionality. Use of StartRow and PrefixFilter will
> still be different.
>
> Thanks,
> Anil Gupta
>
>
>
> On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> That is a good point. There is no reason why prefix filter cannot issue a
> seek to the first KV for that prefix.
> >Although it lead to a practice where people would the prefix filter when
> they in fact should just set the start row.
> >
> >
> >
> >
> >
> >----- Original Message -----
> >From: anil gupta <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED]
> >Cc:
> >Sent: Wednesday, October 17, 2012 9:41 AM
> >Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >
> >Hi Zahoor,
> >
> >I heavily use prefix filter. Every time i have to explicitly define the
> >startRow. So, that's the current behavior. However, initially this
> behavior
> >was confusing to me also.
> >I think that when a Prefix filter is defined then internally the
> >startRow=prefix can be set. User defined StartRow takes precedence over
> the
> >prefixFilter startRow. If the current prefixFilter can be modified in that
> >way then it will eradicate this confusion regarding performance of prefix
> >filter.
> >
> >Thanks,
> >Anil Gupta
> >
> >On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[EMAIL PROTECTED]>
> wrote:
> >
> >> First i upgraded my cluster to 94.2.. even then the problem persisted..
> >> Then i moved to using startRow instead of prefix filter..
> >>
> >>
> >> ,/zahoor
> >>
> >> On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Sorry for the delay.
> >> >
> >> > It looks like the problem is because of PrefixFilter...
> >> > I assumed that i does a seek...
> >> >
> >> > If i use startRow instead.. it works fine.. But is it the correct
> >> approach?
> >> >
> >> > ./zahoor
> >> >
> >> >
> >> > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED]
> >> >wrote:
> >> >
> >> >> I reopened HBASE-6577
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message -----
> >> >> From: lars hofhansl <[EMAIL PROTECTED]>
> >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> >> >> [EMAIL PROTECTED]>
> >> >> Cc:
> >> >> Sent: Tuesday, October 16, 2012 2:39 PM
> >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >> >>
> >> >> Looks like this is exactly the scenario I was trying to optimize with
> >> >> HBASE-6577. Hmm...
> >> >> ________________________________
> >> >> From: lars hofhansl <[EMAIL PROTECTED]>
> >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >> >> Sent: Tuesday, October 16, 2012 12:21 AM
> >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >> >>
> >> >> PrefixFilter does not do any seeking by itself, so I doubt this is
> >> >> related to HBASE-6757.
> >> >> Does this only happen with FAST_DIFF compression?
> >> >>
> >> >>