Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Use of MD5 as row keys - is this safe?


+
Jonathan Bishop 2012-07-20, 16:22
+
Damien Hardy 2012-07-20, 16:31
+
Michel Segel 2012-07-20, 19:16
+
Joe Pallas 2012-07-20, 20:35
+
Michel Segel 2012-07-22, 12:00
+
Ethan Jewett 2012-07-22, 14:21
+
Michael Segel 2012-07-23, 11:55
+
Jonathan Bishop 2012-07-23, 16:58
+
Amandeep Khurana 2012-07-23, 18:38
Copy link to this message
-
Re: Use of MD5 as row keys - is this safe?
Hi,

I use reversed hex for auto-incremented ids.
For example:
id=123456, row key=042E1
id=123457, row key=142E1
I've started use this approach recently, but it seems it works pretty well.
All regions are distributed uniformly, with no hot-spotting

2012/7/20 Jonathan Bishop <[EMAIL PROTECTED]>

> Hi,
>
> I know it is a commonly suggested to use an MD5 checksum to create a row
> key from some other identifier, such as a string or long. This is usually
> done to guard against hot-spotting and seems to work well.
>
> My concern is that there no guard against collision when this is done - two
> different strings or longs could produce the same row-key. Although this is
> very unlikely, it is bothersome to consider this possibility for large
> systems.
>
> So what I usually do is concatenate the MD5 with the original identifier...
>
> MD5(id) + id
>
> which assures that the rowkey is both randomly distributed and unique.
>
> Is this necessary, or is it the common practice to just use the MD5
> checksum itself?
>
> Thanks,
>
> Jon
>
+
Rob Roland 2012-07-20, 19:21