Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Is it necessary to set MD5 on rowkey?


Copy link to this message
-
Re: Is it necessary to set MD5 on rowkey?
Hello,

There is middle term betwen sequecial keys (hot spoting risk) and md5
(heavy scan):
  * you can use composed keys with a field that can segregate data
(hostname, productname, metric name) like OpenTSDB
  * or use Salt with a limited number of values (example
substr(md5(rowid),0,1) = 16 values)
    so that a scan is a combination of 16 filters on on each salt values
    you can base your code on HBaseWD by sematext

http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
       https://github.com/sematext/HBaseWD

Cheers,
2012/12/18 bigdata <[EMAIL PROTECTED]>

> Many articles tell me that MD5 rowkey or part of it is good method to
> balance the records stored in different parts. But If I want to search some
> sequential rowkey records, such as date as rowkey or partially. I can not
> use rowkey filter to scan a range of date value one time on the date by
> MD5. How to balance this issue?
> Thanks.
>
>
--
Damien HARDY
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB