Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Is it necessary to set MD5 on rowkey?


Copy link to this message
-
Re: Is it necessary to set MD5 on rowkey?
On Wed, Dec 19, 2012 at 1:26 PM, David Arthur <[EMAIL PROTECTED]> wrote:

> Let's say you want to decompose a url into domain and path to include in
> your row key.
>
> You could of course just use the url as the key, but you will see
> hotspotting since most will start with "http".
Doesn't the original Bigtable paper [0] design around this problem by
dropping the protocol and only storing the domain? *goes to check* Yes, it
does.

Personally, I've never encountered an HBase schema design problem where
salting really nailed it. It's an okay place to start with initial designs,
especially if you don't know your data well. I'm a big fan of using the
natural variance in the data itself to solve this problem. OpenTSDB does
this quite well, IMHO. Plus, it's kind of a game or data puzzle -- how to
use the data's nature to your advantage :)

Just my 2¢
-n

[0]:
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf