Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Use of MD5 as row keys - is this safe?


Copy link to this message
-
Re: Use of MD5 as row keys - is this safe?
Ethan,

Its not a question of collision attacks, although that is an indication of the strength of the hash.

Just because both MD5 and SHA-1 yield a key of the same length, that doesn't mean that they will have the same chance of a collision. The algorithm has some impact on this too.

I agree that if you're looking to hash the key, MD5 should be good enough. If you don't like it, SHA-1 or SHA-2 are also available as part of Sun's JDK.
(See Appendix A here: http://docs.oracle.com/javase/6/docs/technotes/guides/security/p11guide.html)

Key's longer than 512 are not exportable from the US.

The point I was making was that if you are not comfortable in using MD5 and you want to append the key to the hash, you may want to consider a different hash.

In terms of Uniqueness, take a UUID , hash it, and then truncate the hash to 128 bits from 160 bits. This actually is a form of the UUID. While this could increase the likelihood of a collision, I haven't seen one in real use.

The idea of creating a hash and then prepending it to the key is a tad pessimistic.

That was the point.
 
On Jul 22, 2012, at 9:21 AM, Ethan Jewett wrote:

> To echo Joe Pallas:
>
> Any fairly "random" hash algorithm producing the same length output should
> have about the same extremely small chance of producing the same output for
> two different inputs - a collision. It's a problem you need to be aware of
> no matter what hash algorithm you use. (Hash functions are mappings from a
> theoretically infinite input space to a finitely large output space, so
> they obviously generate the same output for multiple inputs.)
>
> SHA-1 specifically (and MD5 even more-so) has an attack that shows that
> given a specific input and output, we can calculate a new input that
> produces the same output with better than brute-force efficiency.
>
> Collisions and collision attacks are two different things. Collision
> attacks are a problem for cryptographic uses like signing, but how does
> this have anything to do with the problem of generating hBase row keys?
> Just use the fastest, most accessible, random-enough algorithm you can
> find, and if you are really worried about collisions then do something to
> ensure that the key will be unique. Right?
>
> Cheers,
> Ethan
>
> On Sun, Jul 22, 2012 at 2:00 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> http://en.wikipedia.org/wiki/SHA-1
>>
>> Check out the comparisons between the different SHA algos.
>>
>> In theory a collision was found for SHA-1, but none found for SHA-2 does
>> that mean that a collision doesn't exist? No, it means that it hasn't
>> happened yet and the odds are that it won't be found. Possible? Yes,
>> however, highly improbable. You have a better chance of winning the lotto...
>>
>> The point was that if you are going to hash your key,then concatenate the
>> initial key, you would be better off looking at the SHA-1 option. You have
>> to consider a couple of factors...
>> 1: availability of the algo. SHA-1 is in the standard java API and is
>> readily available.
>> 2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For
>> most, I'll say probably.
>> 3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and
>> the original key added.
>>
>> Just food for thought...
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Jul 20, 2012, at 3:35 PM, Joe Pallas <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> On Jul 20, 2012, at 12:16 PM, Michel Segel wrote:
>>>
>>>> I don't believe that there has been any reports of collisions, but if.
>> You are concerned you could use the SHA-1 for generating the hash.
>> Relatively speaking, SHA-1is slower, but still fast enough for most
>> applications.
>>>
>>> Every hash function can have collisions, by definition.  If the
>> correctness of your design depends on collisions being impossible, rather
>> than very rare, then your design is faulty.
>>>
>>> Cryptographic hash functions have the property that it is
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB