Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> PrefixFilter


I'm willing to be told I'm completely wrong here, but it seems like the prefix filter should be capable of using the same mechanism used in a row-key lookup or a scan with a start and stop row.

If HBase were to be like a hash table with no notion of sorted-ness, I can understand a partial-key lookup requiring something akin to a full-table scan. But given that HBase orders records by row-key, a prefix lookup should be able to do a binary search over the index?

----- Original Message -----
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
At: Aug 12 2013 16:06:58

Adding back user@

bq. does it jump directly to Prefix3

I don't think so.

Are your prefixes of fixed length ?
If so, take a look at FuzzyRowFilter.

Cheers

On Mon, Aug 12, 2013 at 11:33 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <[EMAIL PROTECTED]> wrote:

Ted: Thanks for looking that up.

If I have rows with the following keys in my table (let's say table has only 1 region):
Prefix1/Suffix
Prefix2/Suffix
Prefix3/Suffix
Prefix3/Suffix2
Prefix4/Suffix

and if I specify a prefix filter with Prefix3, does it jump directly to Prefix3, or does it read in both Prefix1/Suffix and Prefix2/Suffix and discard them before returning Prefix3/Suffix and Prefix3/Suffix2.

Using the prefix filter is much slower than a scan with start row/end row and I'm trying to understand why. Thanks!
----- Original Message -----
From: [EMAIL PROTECTED]
To: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN), [EMAIL PROTECTED]
At: Aug 12 2013 14:08:17

In filterAllRemaining() method:
      
  public boolean filterAllRemaining() {
    return passedPrefix;
  }
In filterRowKey():
    // if they are equal, return false => pass row
    // else return true, filter row
    // if we are passed the prefix, set flag
    int cmp = Bytes.compareTo(buffer, offset, this.prefix.length, this.prefix, 0,
        this.prefix.length);
    if(cmp > 0) {
      passedPrefix = true;
              
    }
So once the prefix has passed, the remaining rows would be skipped.
On Mon, Aug 12, 2013 at 11:01 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <[EMAIL PROTECTED]> wrote:
 
Anyone know if the prefix filter[1] does a full table scan?

1 - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html