Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Use of MD5 as row keys - is this safe?


Copy link to this message
-
Re: Use of MD5 as row keys - is this safe?
Michael Segel 2012-07-23, 11:55
Ethan,

Its not a question of collision attacks, although that is an indication of the strength of the hash.

Just because both MD5 and SHA-1 yield a key of the same length, that doesn't mean that they will have the same chance of a collision. The algorithm has some impact on this too.

I agree that if you're looking to hash the key, MD5 should be good enough. If you don't like it, SHA-1 or SHA-2 are also available as part of Sun's JDK.
(See Appendix A here: http://docs.oracle.com/javase/6/docs/technotes/guides/security/p11guide.html)

Key's longer than 512 are not exportable from the US.

The point I was making was that if you are not comfortable in using MD5 and you want to append the key to the hash, you may want to consider a different hash.

In terms of Uniqueness, take a UUID , hash it, and then truncate the hash to 128 bits from 160 bits. This actually is a form of the UUID. While this could increase the likelihood of a collision, I haven't seen one in real use.

The idea of creating a hash and then prepending it to the key is a tad pessimistic.

That was the point.
 
On Jul 22, 2012, at 9:21 AM, Ethan Jewett wrote:

> To echo Joe Pallas:
>
> Any fairly "random" hash algorithm producing the same length output should
> have about the same extremely small chance of producing the same output for
> two different inputs - a collision. It's a problem you need to be aware of
> no matter what hash algorithm you use. (Hash functions are mappings from a
> theoretically infinite input space to a finitely large output space, so
> they obviously generate the same output for multiple inputs.)
>
> SHA-1 specifically (and MD5 even more-so) has an attack that shows that
> given a specific input and output, we can calculate a new input that
> produces the same output with better than brute-force efficiency.
>
> Collisions and collision attacks are two different things. Collision
> attacks are a problem for cryptographic uses like signing, but how does
> this have anything to do with the problem of generating hBase row keys?
> Just use the fastest, most accessible, random-enough algorithm you can
> find, and if you are really worried about collisions then do something to
> ensure that the key will be unique. Right?
>
> Cheers,
> Ethan
>
> On Sun, Jul 22, 2012 at 2:00 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> http://en.wikipedia.org/wiki/SHA-1
>>
>> Check out the comparisons between the different SHA algos.
>>
>> In theory a collision was found for SHA-1, but none found for SHA-2 does
>> that mean that a collision doesn't exist? No, it means that it hasn't
>> happened yet and the odds are that it won't be found. Possible? Yes,
>> however, highly improbable. You have a better chance of winning the lotto...
>>
>> The point was that if you are going to hash your key,then concatenate the
>> initial key, you would be better off looking at the SHA-1 option. You have
>> to consider a couple of factors...
>> 1: availability of the algo. SHA-1 is in the standard java API and is
>> readily available.
>> 2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For
>> most, I'll say probably.
>> 3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and
>> the original key added.
>>
>> Just food for thought...
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Jul 20, 2012, at 3:35 PM, Joe Pallas <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> On Jul 20, 2012, at 12:16 PM, Michel Segel wrote:
>>>
>>>> I don't believe that there has been any reports of collisions, but if.
>> You are concerned you could use the SHA-1 for generating the hash.
>> Relatively speaking, SHA-1is slower, but still fast enough for most
>> applications.
>>>
>>> Every hash function can have collisions, by definition.  If the
>> correctness of your design depends on collisions being impossible, rather
>> than very rare, then your design is faulty.
>>>
>>> Cryptographic hash functions have the property that it is