Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow scanning for PrefixFilter on EncodedBlocks


Copy link to this message
-
Re: Slow scanning for PrefixFilter on EncodedBlocks
+1 for making PrefixFIlter seek instead of using a startRow explicitly.

./zahoor

On Thu, Oct 18, 2012 at 4:05 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Oh yeah, I meant that one should always set the startrow as a matter of
> practice - if possible - and never rely on the filter alone.
>
>
>
> ________________________________
>  From: anil gupta <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Wednesday, October 17, 2012 12:25 PM
> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
>
>
> Hi Lars,
>
> There is a specific use case for this:
>
> Table: Suppose i have a rowkey:<customer_id><event_timestamp><uid>
>
> Use case: I would like to get all the events of customer_id=123.
> Case 1: If i only use startRow=123 then i will get events of  other
> customers having customers_id > 123 since the scanner will be keep on
> fetching rows until the end of table.
> Case 2: If i use prefixFilter=123 and startRow=123 then i will get the
> correct result.
>
> IMHO, adding the feature of smartly adding the startRow in PrefixFilter
> wont hurt any existing functionality. Use of StartRow and PrefixFilter will
> still be different.
>
> Thanks,
> Anil Gupta
>
>
>
> On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> That is a good point. There is no reason why prefix filter cannot issue a
> seek to the first KV for that prefix.
> >Although it lead to a practice where people would the prefix filter when
> they in fact should just set the start row.
> >
> >
> >
> >
> >
> >----- Original Message -----
> >From: anil gupta <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED]
> >Cc:
> >Sent: Wednesday, October 17, 2012 9:41 AM
> >Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >
> >Hi Zahoor,
> >
> >I heavily use prefix filter. Every time i have to explicitly define the
> >startRow. So, that's the current behavior. However, initially this
> behavior
> >was confusing to me also.
> >I think that when a Prefix filter is defined then internally the
> >startRow=prefix can be set. User defined StartRow takes precedence over
> the
> >prefixFilter startRow. If the current prefixFilter can be modified in that
> >way then it will eradicate this confusion regarding performance of prefix
> >filter.
> >
> >Thanks,
> >Anil Gupta
> >
> >On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[EMAIL PROTECTED]>
> wrote:
> >
> >> First i upgraded my cluster to 94.2.. even then the problem persisted..
> >> Then i moved to using startRow instead of prefix filter..
> >>
> >>
> >> ,/zahoor
> >>
> >> On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Sorry for the delay.
> >> >
> >> > It looks like the problem is because of PrefixFilter...
> >> > I assumed that i does a seek...
> >> >
> >> > If i use startRow instead.. it works fine.. But is it the correct
> >> approach?
> >> >
> >> > ./zahoor
> >> >
> >> >
> >> > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[EMAIL PROTECTED]
> >> >wrote:
> >> >
> >> >> I reopened HBASE-6577
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message -----
> >> >> From: lars hofhansl <[EMAIL PROTECTED]>
> >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> >> >> [EMAIL PROTECTED]>
> >> >> Cc:
> >> >> Sent: Tuesday, October 16, 2012 2:39 PM
> >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >> >>
> >> >> Looks like this is exactly the scenario I was trying to optimize with
> >> >> HBASE-6577. Hmm...
> >> >> ________________________________
> >> >> From: lars hofhansl <[EMAIL PROTECTED]>
> >> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >> >> Sent: Tuesday, October 16, 2012 12:21 AM
> >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >> >>
> >> >> PrefixFilter does not do any seeking by itself, so I doubt this is
> >> >> related to HBASE-6757.
> >> >> Does this only happen with FAST_DIFF compression?
> >> >>
> >> >>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB