Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Is it necessary to set MD5 on rowkey?


Copy link to this message
-
Re: Is it necessary to set MD5 on rowkey?
Damien Hardy 2012-12-18, 09:33
Hello,

There is middle term betwen sequecial keys (hot spoting risk) and md5
(heavy scan):
  * you can use composed keys with a field that can segregate data
(hostname, productname, metric name) like OpenTSDB
  * or use Salt with a limited number of values (example
substr(md5(rowid),0,1) = 16 values)
    so that a scan is a combination of 16 filters on on each salt values
    you can base your code on HBaseWD by sematext

http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
       https://github.com/sematext/HBaseWD

Cheers,
2012/12/18 bigdata <[EMAIL PROTECTED]>

> Many articles tell me that MD5 rowkey or part of it is good method to
> balance the records stored in different parts. But If I want to search some
> sequential rowkey records, such as date as rowkey or partially. I can not
> use rowkey filter to scan a range of date value one time on the date by
> MD5. How to balance this issue?
> Thanks.
>
>
--
Damien HARDY