Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> PrefixFilter


Copy link to this message
-
Re: PrefixFilter
What Anil said.
Filters are executed per Store (i.e. per region per column family). So each filter in each store would need seek to the start row.
It is more efficient to let the scanner do that ahead of time by setting the startrow to the prefix.
We should document that if we haven't.

-- Lars

----- Original Message -----
From: anil gupta <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Sudarshan Kadambi <[EMAIL PROTECTED]>
Cc:
Sent: Monday, August 12, 2013 2:35 PM
Subject: Re: PrefixFilter

Hi Sudarshan,

While using the prefix filter, you also have to set the startRow() and
stopRow for the behavior that you are expecting.
This kind of discussion have been done previously on mailing list, yet no
changes have been done to behavior of PrefixFilter.
Setting the startRow(Prefix3) will make the filter jump directly to your
prefix.
Let me know if you need further details on using the prefix filter for very
fast prefix matches.

~Anil
On Mon, Aug 12, 2013 at 1:55 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <
[EMAIL PROTECTED]> wrote:

> I'm willing to be told I'm completely wrong here, but it seems like the
> prefix filter should be capable of using the same mechanism used in a
> row-key lookup or a scan with a start and stop row.
>
> If HBase were to be like a hash table with no notion of sorted-ness, I can
> understand a partial-key lookup requiring something akin to a full-table
> scan. But given that HBase orders records by row-key, a prefix lookup
> should be able to do a binary search over the index?
>
> ----- Original Message -----
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Cc: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
> At: Aug 12 2013 16:06:58
>
> Adding back user@
>
> bq. does it jump directly to Prefix3
>
> I don't think so.
>
> Are your prefixes of fixed length ?
> If so, take a look at FuzzyRowFilter.
>
> Cheers
>
> On Mon, Aug 12, 2013 at 11:33 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
> <[EMAIL PROTECTED]> wrote:
>
> Ted: Thanks for looking that up.
>
> If I have rows with the following keys in my table (let's say table has
> only 1 region):
> Prefix1/Suffix
> Prefix2/Suffix
> Prefix3/Suffix
> Prefix3/Suffix2
> Prefix4/Suffix
>
> and if I specify a prefix filter with Prefix3, does it jump directly to
> Prefix3, or does it read in both Prefix1/Suffix and Prefix2/Suffix and
> discard them before returning Prefix3/Suffix and Prefix3/Suffix2.
>
> Using the prefix filter is much slower than a scan with start row/end row
> and I'm trying to understand why. Thanks!
>
>
> ----- Original Message -----
> From: [EMAIL PROTECTED]
> To: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN), [EMAIL PROTECTED]
> At: Aug 12 2013 14:08:17
>
> In filterAllRemaining() method:
>
>   public boolean filterAllRemaining() {
>     return passedPrefix;
>   }
> In filterRowKey():
>     // if they are equal, return false => pass row
>     // else return true, filter row
>     // if we are passed the prefix, set flag
>     int cmp = Bytes.compareTo(buffer, offset, this.prefix.length,
> this.prefix, 0,
>         this.prefix.length);
>     if(cmp > 0) {
>       passedPrefix = true;
>
>     }
> So once the prefix has passed, the remaining rows would be skipped.
> On Mon, Aug 12, 2013 at 11:01 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
> <[EMAIL PROTECTED]> wrote:
>
> Anyone know if the prefix filter[1] does a full table scan?
>
> 1 -
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html
>
>
>
--
Thanks & Regards,
Anil Gupta