HBase, mail # dev - Re: Questions on FuzzyRowFilter - 2014-05-16, 20:06
Solr & Elasticsearch trainings in New York & San Fransisco [more info]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: Questions on FuzzyRowFilter
Hi Mike,
I agree with you - the way you've outlined is exactly the way Phoenix has
implemented it. It's a bit of a problem with terminology, though. We call
it salting: http://phoenix.incubator.apache.org/salted.html. We hash the
key, mod the hash with the SALT_BUCKET value you provide, and prepend the
row key with this single byte value. Maybe you can coin a good term for
this technique?

FWIW, you don't lose the ability to do a range scan when you salt (or
hash-the-key and mod by the number of "buckets"), but you do need to run a
scan for each possible value of your salt byte (0 - SALT_BUCKET-1). Then
the client does a merge sort among these scans. It performs well.

Thanks,
James
On Fri, May 9, 2014 at 11:57 PM, Michael Segel <[EMAIL PROTECTED]>wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB