Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Is it necessary to set MD5 on rowkey?


Copy link to this message
-
Re: Is it necessary to set MD5 on rowkey?
This what wrote:
>> If you salt, you will have to do a *FULL* *TABLE* *SCAN* in order to
>> retrieve the row.
>> If you do something like a salt that uses only  a preset of N combinations,
>> you will have to do N get()s in order to fetch the row.
>>

By definition the salt is a random number which is the first part of the one way crypt() function.
Using some modulo function is the second half of what I said. ;-)
Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 19, 2012, at 7:35 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:

> I have to disagree with the *FULL* *TABLE* *SCAN* in order to retrieve the row.
>
> If I know that I have one byte salting between 1 and 10, I will have
> to do 10 gets to get the row. And they will most probably all be on
> different RS, so it will not be more than 1 get per server. This will
> take almost the same time as doing a simple get.
>
> I understand your point that salting is inducting some bad things, but
> on the other side, it's easy and can still be usefull. Hash will allow
> you a direct access with one call, but you still need to calculate the
> hash. So what's faster? Calculate the hash and do one call to one
> server? Or go directly with one call to multiple servers? It all
> depend on the way you access your data.
>
> Personnaly, I'm using hash almost everwhere, but I still understand
> that some people might be able to use salting for their specific
> purposes.
>
> JM
>
> 2012/12/19, Michael Segel <[EMAIL PROTECTED]>:
>> Ok,
>>
>> Lets try this one more time...
>>
>> If you salt, you will have to do a *FULL* *TABLE* *SCAN* in order to
>> retrieve the row.
>> If you do something like a salt that uses only  a preset of N combinations,
>> you will have to do N get()s in order to fetch the row.
>>
>> This is bad. VERY BAD.
>>
>> If you hash the row, you will get a consistent value each time you hash the
>> key.  If you use SHA-1, the odds of a collision are mathematically possible,
>> however highly improbable. So people have recommended that they append the
>> key to the hash to form the new key. Here, you might as well as truncate the
>> hash to just the most significant byte or two and the append the key. This
>> will give you enough of an even distribution that you can avoid hot
>> spotting.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB