Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Is it necessary to set MD5 on rowkey?


Copy link to this message
-
Re: Is it necessary to set MD5 on rowkey?
On Wed, Dec 19, 2012 at 1:26 PM, David Arthur <[EMAIL PROTECTED]> wrote:

> Let's say you want to decompose a url into domain and path to include in
> your row key.
>
> You could of course just use the url as the key, but you will see
> hotspotting since most will start with "http".
Doesn't the original Bigtable paper [0] design around this problem by
dropping the protocol and only storing the domain? *goes to check* Yes, it
does.

Personally, I've never encountered an HBase schema design problem where
salting really nailed it. It's an okay place to start with initial designs,
especially if you don't know your data well. I'm a big fan of using the
natural variance in the data itself to solve this problem. OpenTSDB does
this quite well, IMHO. Plus, it's kind of a game or data puzzle -- how to
use the data's nature to your advantage :)

Just my 2¢
-n

[0]:
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB