Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region Splits


The downside of hashing is not that it's unpredictable, but that it's
non-reversible (which is why you need to append the original key).
Reversing should be fine, just make sure that you performa a byte-order
reversal so that you have uniform distribution.

On 11/22/11 7:47 PM, "Mark" <[EMAIL PROTECTED]> wrote:

>Ok so this would be "short scans"?
>
>In my use case this would be unnecessary so I think Im going to run with
>the reversed id technique. I'm actually surprised I've never heard of
>anyone using this over the non predictable hashing.
>
>On 11/22/11 5:35 PM, Sam Seigal wrote:
>> If you are prefixing your keys with predictable hashes, you can do
>> range scans - i.e. create a scanner for each prefix and then merge
>> results at the client. With unpredictable hashes and key reversals ,
>> this might not be entirely possible.
>>
>> I remember someone on the mailing list mentioning that Mozilla Socorro
>> uses a similar technique. I haven't had a chance to look at their code
>> yet, but that is something you might want to look at.
>>
>> On Tue, Nov 22, 2011 at 5:11 PM, Mark<[EMAIL PROTECTED]>  wrote:
>>> What to you mean by "short scans"?
>>>
>>> I understand that scans will not be possible with this method but
>>>neither
>>> would they be if I hashed them so it seems like I'm in the same boat
>>>anyway.
>>>
>>> On 11/22/11 5:00 PM, Amandeep Khurana wrote:
>>>> Mark
>>>>
>>>> Key designs depend on expected access patterns and use cases. From a
>>>> theoretical stand point, what you are saying will work to distribute
>>>> writes but if you want to access a small range, you'll need to fan out
>>>> your reads and can't leverage short scans.
>>>>
>>>> Amandeep
>>>>
>>>> On Nov 22, 2011, at 4:55 PM, Mark<[EMAIL PROTECTED]>    wrote:
>>>>
>>>>> I just thought of something.
>>>>>
>>>>> In cases where the id is sequential couldn't one simply reverse the
>>>>>id to
>>>>> get more of a uniform distribution?
>>>>>
>>>>> 510911 =>    119015
>>>>> 510912 =>    219015
>>>>> 510913 =>    319015
>>>>> 510914 =>    419015
>>>>>
>>>>> That seems like a reasonable alternative that doesn't require
>>>>>prefixing
>>>>> each row key with an extra 16 bytes. Am I wrong in thinking this
>>>>>could work?
>>>>>
>>>>>
>>>>> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>>>>>> If you increase the region size to 2GB, then all regions (current
>>>>>>and
>>>>>> new)
>>>>>> will avoid a split until their aggregate StoreFile size reaches that
>>>>>> limit.  Reorganizing the regions for a uniform growth pattern is
>>>>>>really
>>>>>> a
>>>>>> schema design problem.  There is the capability to merge two
>>>>>>adjacent
>>>>>> regions if you know that your data growth pattern is non-uniform.
>>>>>> StumbleUpon&     other companies have more experience with those
>>>>>>utilities
>>>>>> than I do.
>>>>>>
>>>>>> Note: With the introduction of HFileV2 in 0.92, you'll definitely
>>>>>>want
>>>>>> to
>>>>>> lean towards increasing the region size.  HFile scalability code is
>>>>>>more
>>>>>> mature/stable than the region splitting code.  Plus, automatic
>>>>>>region
>>>>>> splitting is harder to optimize&     debug when failures occur.
>>>>>>
>>>>>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>>>>>> <[EMAIL PROTECTED]>     wrote:
>>>>>>
>>>>>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>>>>>
>>>>>>> What will happen if we increased the region size, say from current
>>>>>>> value
>>>>>>> of 256 MB to a new value of 2GB?
>>>>>>> Will existing regions continue to use only 256 MB space?
>>>>>>>
>>>>>>> Is there a way to reorganize the regions so that each regions
>>>>>>>grows to
>>>>>>> 2GB size?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Srikanth
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Nicolas Spiegelberg [mailto:[EMAIL PROTECTED]]
>>>>>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>>>>>> To: [EMAIL PROTECTED]
>>>>>>> Subject: Re: Region Splits
>>>>>>>
>>>>>>> No.  The purpose of major compactions is to merge&     dedupe