Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Is it necessary to set MD5 on rowkey?


+
bigdata 2012-12-18, 09:20
+
Doug Meil 2012-12-18, 13:40
+
Damien Hardy 2012-12-18, 09:33
+
Michael Segel 2012-12-18, 13:52
+
bigdata 2012-12-18, 15:20
+
Alex Baranau 2012-12-18, 17:12
+
Michael Segel 2012-12-18, 17:24
+
Alex Baranau 2012-12-18, 17:36
+
Michael Segel 2012-12-18, 23:29
+
lars hofhansl 2012-12-19, 18:37
+
Michael Segel 2012-12-19, 19:46
+
lars hofhansl 2012-12-19, 20:51
+
Michael Segel 2012-12-19, 21:02
+
David Arthur 2012-12-19, 21:26
+
Nick Dimiduk 2012-12-19, 22:15
+
Andrew Purtell 2012-12-19, 22:28
+
David Arthur 2012-12-19, 23:04
+
Alex Baranau 2012-12-19, 23:07
+
Michael Segel 2012-12-20, 01:09
+
Michael Segel 2012-12-20, 01:02
+
Jean-Marc Spaggiari 2012-12-20, 01:11
+
Michael Segel 2012-12-20, 01:23
+
Jean-Marc Spaggiari 2012-12-20, 01:35
Copy link to this message
-
Re: Is it necessary to set MD5 on rowkey?
Michel Segel 2012-12-20, 01:47
This what wrote:
>> If you salt, you will have to do a *FULL* *TABLE* *SCAN* in order to
>> retrieve the row.
>> If you do something like a salt that uses only  a preset of N combinations,
>> you will have to do N get()s in order to fetch the row.
>>

By definition the salt is a random number which is the first part of the one way crypt() function.
Using some modulo function is the second half of what I said. ;-)
Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 19, 2012, at 7:35 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:

> I have to disagree with the *FULL* *TABLE* *SCAN* in order to retrieve the row.
>
> If I know that I have one byte salting between 1 and 10, I will have
> to do 10 gets to get the row. And they will most probably all be on
> different RS, so it will not be more than 1 get per server. This will
> take almost the same time as doing a simple get.
>
> I understand your point that salting is inducting some bad things, but
> on the other side, it's easy and can still be usefull. Hash will allow
> you a direct access with one call, but you still need to calculate the
> hash. So what's faster? Calculate the hash and do one call to one
> server? Or go directly with one call to multiple servers? It all
> depend on the way you access your data.
>
> Personnaly, I'm using hash almost everwhere, but I still understand
> that some people might be able to use salting for their specific
> purposes.
>
> JM
>
> 2012/12/19, Michael Segel <[EMAIL PROTECTED]>:
>> Ok,
>>
>> Lets try this one more time...
>>
>> If you salt, you will have to do a *FULL* *TABLE* *SCAN* in order to
>> retrieve the row.
>> If you do something like a salt that uses only  a preset of N combinations,
>> you will have to do N get()s in order to fetch the row.
>>
>> This is bad. VERY BAD.
>>
>> If you hash the row, you will get a consistent value each time you hash the
>> key.  If you use SHA-1, the odds of a collision are mathematically possible,
>> however highly improbable. So people have recommended that they append the
>> key to the hash to form the new key. Here, you might as well as truncate the
>> hash to just the most significant byte or two and the append the key. This
>> will give you enough of an even distribution that you can avoid hot
>> spotting.
>
+
lars hofhansl 2012-12-20, 02:06
+
Michael Segel 2012-12-20, 13:20
+
Nick Dimiduk 2012-12-20, 18:15
+
Michael Segel 2012-12-20, 20:15
+
k8 robot 2013-02-06, 01:46