Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Use of MD5 as row keys - is this safe?


Copy link to this message
-
Re: Use of MD5 as row keys - is this safe?
Michel Segel 2012-07-22, 12:00
http://en.wikipedia.org/wiki/SHA-1

Check out the comparisons between the different SHA algos.

In theory a collision was found for SHA-1, but none found for SHA-2 does that mean that a collision doesn't exist? No, it means that it hasn't happened yet and the odds are that it won't be found. Possible? Yes, however, highly improbable. You have a better chance of winning the lotto...

The point was that if you are going to hash your key,then concatenate the initial key, you would be better off looking at the SHA-1 option. You have to consider a couple of factors...
1: availability of the algo. SHA-1 is in the standard java API and is readily available.
2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For most, I'll say probably.
3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and the original key added.

Just food for thought...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jul 20, 2012, at 3:35 PM, Joe Pallas <[EMAIL PROTECTED]> wrote:

>
> On Jul 20, 2012, at 12:16 PM, Michel Segel wrote:
>
>> I don't believe that there has been any reports of collisions, but if. You are concerned you could use the SHA-1 for generating the hash. Relatively speaking, SHA-1is slower, but still fast enough for most applications.
>
> Every hash function can have collisions, by definition.  If the correctness of your design depends on collisions being impossible, rather than very rare, then your design is faulty.
>
> Cryptographic hash functions have the property that it is computationally hard to create inputs that match a given output.  That doesn’t in itself make cryptographic hash functions better than other hash functions for avoiding hot-spotting.  (But it does usually make cryptographic hash functions more expensive to compute than other hash functions.)
>
> You may want to look at <http://www.strchr.com/hash_functions>  and <http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633>.
>
> Hope this helps,
> joe
>
>