Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - PrefixFilter


+
Sudarshan Kadambi 2013-08-12, 18:01
+
Ted Yu 2013-08-12, 18:08
+
Sudarshan Kadambi 2013-08-12, 20:55
Copy link to this message
-
Re: PrefixFilter
anil gupta 2013-08-12, 21:35
Hi Sudarshan,

While using the prefix filter, you also have to set the startRow() and
stopRow for the behavior that you are expecting.
This kind of discussion have been done previously on mailing list, yet no
changes have been done to behavior of PrefixFilter.
Setting the startRow(Prefix3) will make the filter jump directly to your
prefix.
Let me know if you need further details on using the prefix filter for very
fast prefix matches.

~Anil
On Mon, Aug 12, 2013 at 1:55 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <
[EMAIL PROTECTED]> wrote:

> I'm willing to be told I'm completely wrong here, but it seems like the
> prefix filter should be capable of using the same mechanism used in a
> row-key lookup or a scan with a start and stop row.
>
> If HBase were to be like a hash table with no notion of sorted-ness, I can
> understand a partial-key lookup requiring something akin to a full-table
> scan. But given that HBase orders records by row-key, a prefix lookup
> should be able to do a binary search over the index?
>
> ----- Original Message -----
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Cc: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
> At: Aug 12 2013 16:06:58
>
> Adding back user@
>
> bq. does it jump directly to Prefix3
>
> I don't think so.
>
> Are your prefixes of fixed length ?
> If so, take a look at FuzzyRowFilter.
>
> Cheers
>
> On Mon, Aug 12, 2013 at 11:33 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
> <[EMAIL PROTECTED]> wrote:
>
> Ted: Thanks for looking that up.
>
> If I have rows with the following keys in my table (let's say table has
> only 1 region):
> Prefix1/Suffix
> Prefix2/Suffix
> Prefix3/Suffix
> Prefix3/Suffix2
> Prefix4/Suffix
>
> and if I specify a prefix filter with Prefix3, does it jump directly to
> Prefix3, or does it read in both Prefix1/Suffix and Prefix2/Suffix and
> discard them before returning Prefix3/Suffix and Prefix3/Suffix2.
>
> Using the prefix filter is much slower than a scan with start row/end row
> and I'm trying to understand why. Thanks!
>
>
> ----- Original Message -----
> From: [EMAIL PROTECTED]
> To: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN), [EMAIL PROTECTED]
> At: Aug 12 2013 14:08:17
>
> In filterAllRemaining() method:
>
>   public boolean filterAllRemaining() {
>     return passedPrefix;
>   }
> In filterRowKey():
>     // if they are equal, return false => pass row
>     // else return true, filter row
>     // if we are passed the prefix, set flag
>     int cmp = Bytes.compareTo(buffer, offset, this.prefix.length,
> this.prefix, 0,
>         this.prefix.length);
>     if(cmp > 0) {
>       passedPrefix = true;
>
>     }
> So once the prefix has passed, the remaining rows would be skipped.
> On Mon, Aug 12, 2013 at 11:01 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
> <[EMAIL PROTECTED]> wrote:
>
> Anyone know if the prefix filter[1] does a full table scan?
>
> 1 -
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html
>
>
>
--
Thanks & Regards,
Anil Gupta
+
mixueqiang 2013-08-16, 03:36
+
lars hofhansl 2013-08-13, 04:45
+
Ted Yu 2013-08-12, 20:06