Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region Splits


The downside of hashing is not that it's unpredictable, but that it's
non-reversible (which is why you need to append the original key).
Reversing should be fine, just make sure that you performa a byte-order
reversal so that you have uniform distribution.

On 11/22/11 7:47 PM, "Mark" <[EMAIL PROTECTED]> wrote:

>Ok so this would be "short scans"?
>
>In my use case this would be unnecessary so I think Im going to run with
>the reversed id technique. I'm actually surprised I've never heard of
>anyone using this over the non predictable hashing.
>
>On 11/22/11 5:35 PM, Sam Seigal wrote:
>> If you are prefixing your keys with predictable hashes, you can do
>> range scans - i.e. create a scanner for each prefix and then merge
>> results at the client. With unpredictable hashes and key reversals ,
>> this might not be entirely possible.
>>
>> I remember someone on the mailing list mentioning that Mozilla Socorro
>> uses a similar technique. I haven't had a chance to look at their code
>> yet, but that is something you might want to look at.
>>
>> On Tue, Nov 22, 2011 at 5:11 PM, Mark<[EMAIL PROTECTED]>  wrote:
>>> What to you mean by "short scans"?
>>>
>>> I understand that scans will not be possible with this method but
>>>neither
>>> would they be if I hashed them so it seems like I'm in the same boat
>>>anyway.
>>>
>>> On 11/22/11 5:00 PM, Amandeep Khurana wrote:
>>>> Mark
>>>>
>>>> Key designs depend on expected access patterns and use cases. From a
>>>> theoretical stand point, what you are saying will work to distribute
>>>> writes but if you want to access a small range, you'll need to fan out
>>>> your reads and can't leverage short scans.
>>>>
>>>> Amandeep
>>>>
>>>> On Nov 22, 2011, at 4:55 PM, Mark<[EMAIL PROTECTED]>    wrote:
>>>>
>>>>> I just thought of something.
>>>>>
>>>>> In cases where the id is sequential couldn't one simply reverse the
>>>>>id to
>>>>> get more of a uniform distribution?
>>>>>
>>>>> 510911 =>    119015
>>>>> 510912 =>    219015
>>>>> 510913 =>    319015
>>>>> 510914 =>    419015
>>>>>
>>>>> That seems like a reasonable alternative that doesn't require
>>>>>prefixing
>>>>> each row key with an extra 16 bytes. Am I wrong in thinking this
>>>>>could work?
>>>>>
>>>>>
>>>>> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>>>>>> If you increase the region size to 2GB, then all regions (current
>>>>>>and
>>>>>> new)
>>>>>> will avoid a split until their aggregate StoreFile size reaches that
>>>>>> limit.  Reorganizing the regions for a uniform growth pattern is
>>>>>>really
>>>>>> a
>>>>>> schema design problem.  There is the capability to merge two
>>>>>>adjacent
>>>>>> regions if you know that your data growth pattern is non-uniform.
>>>>>> StumbleUpon&     other companies have more experience with those
>>>>>>utilities
>>>>>> than I do.
>>>>>>
>>>>>> Note: With the introduction of HFileV2 in 0.92, you'll definitely
>>>>>>want
>>>>>> to
>>>>>> lean towards increasing the region size.  HFile scalability code is
>>>>>>more
>>>>>> mature/stable than the region splitting code.  Plus, automatic
>>>>>>region
>>>>>> splitting is harder to optimize&     debug when failures occur.
>>>>>>
>>>>>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>>>>>> <[EMAIL PROTECTED]>     wrote:
>>>>>>
>>>>>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>>>>>
>>>>>>> What will happen if we increased the region size, say from current
>>>>>>> value
>>>>>>> of 256 MB to a new value of 2GB?
>>>>>>> Will existing regions continue to use only 256 MB space?
>>>>>>>
>>>>>>> Is there a way to reorganize the regions so that each regions
>>>>>>>grows to
>>>>>>> 2GB size?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Srikanth
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Nicolas Spiegelberg [mailto:[EMAIL PROTECTED]]
>>>>>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>>>>>> To: [EMAIL PROTECTED]
>>>>>>> Subject: Re: Region Splits
>>>>>>>
>>>>>>> No.  The purpose of major compactions is to merge&     dedupe
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB