Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Use of MD5 as row keys - is this safe?


+
Jonathan Bishop 2012-07-20, 16:22
+
Damien Hardy 2012-07-20, 16:31
+
Michel Segel 2012-07-20, 19:16
+
Joe Pallas 2012-07-20, 20:35
+
Michel Segel 2012-07-22, 12:00
+
Ethan Jewett 2012-07-22, 14:21
+
Michael Segel 2012-07-23, 11:55
+
Jonathan Bishop 2012-07-23, 16:58
+
Amandeep Khurana 2012-07-23, 18:38
Copy link to this message
-
Re: Use of MD5 as row keys - is this safe?
Hi,

I use reversed hex for auto-incremented ids.
For example:
id=123456, row key=042E1
id=123457, row key=142E1
I've started use this approach recently, but it seems it works pretty well.
All regions are distributed uniformly, with no hot-spotting

2012/7/20 Jonathan Bishop <[EMAIL PROTECTED]>

> Hi,
>
> I know it is a commonly suggested to use an MD5 checksum to create a row
> key from some other identifier, such as a string or long. This is usually
> done to guard against hot-spotting and seems to work well.
>
> My concern is that there no guard against collision when this is done - two
> different strings or longs could produce the same row-key. Although this is
> very unlikely, it is bothersome to consider this possibility for large
> systems.
>
> So what I usually do is concatenate the MD5 with the original identifier...
>
> MD5(id) + id
>
> which assures that the rowkey is both randomly distributed and unique.
>
> Is this necessary, or is it the common practice to just use the MD5
> checksum itself?
>
> Thanks,
>
> Jon
>
+
Rob Roland 2012-07-20, 19:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB