Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region Splits


Copy link to this message
-
Re: Region Splits
Mark

Key designs depend on expected access patterns and use cases. From a
theoretical stand point, what you are saying will work to distribute
writes but if you want to access a small range, you'll need to fan out
your reads and can't leverage short scans.

Amandeep

On Nov 22, 2011, at 4:55 PM, Mark <[EMAIL PROTECTED]> wrote:

> I just thought of something.
>
> In cases where the id is sequential couldn't one simply reverse the id to get more of a uniform distribution?
>
> 510911 => 119015
> 510912 => 219015
> 510913 => 319015
> 510914 => 419015
>
> That seems like a reasonable alternative that doesn't require prefixing each row key with an extra 16 bytes. Am I wrong in thinking this could work?
>
>
> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>> If you increase the region size to 2GB, then all regions (current and new)
>> will avoid a split until their aggregate StoreFile size reaches that
>> limit.  Reorganizing the regions for a uniform growth pattern is really a
>> schema design problem.  There is the capability to merge two adjacent
>> regions if you know that your data growth pattern is non-uniform.
>> StumbleUpon&  other companies have more experience with those utilities
>> than I do.
>>
>> Note: With the introduction of HFileV2 in 0.92, you'll definitely want to
>> lean towards increasing the region size.  HFile scalability code is more
>> mature/stable than the region splitting code.  Plus, automatic region
>> splitting is harder to optimize&  debug when failures occur.
>>
>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>> <[EMAIL PROTECTED]>  wrote:
>>
>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>
>>> What will happen if we increased the region size, say from current value
>>> of 256 MB to a new value of 2GB?
>>> Will existing regions continue to use only 256 MB space?
>>>
>>> Is there a way to reorganize the regions so that each regions grows to
>>> 2GB size?
>>>
>>> Thanks,
>>> Srikanth
>>>
>>> -----Original Message-----
>>> From: Nicolas Spiegelberg [mailto:[EMAIL PROTECTED]]
>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>> To: [EMAIL PROTECTED]
>>> Subject: Re: Region Splits
>>>
>>> No.  The purpose of major compactions is to merge&  dedupe within a region
>>> boundary.  Compactions will not alter region boundaries, except in the
>>> case of splits where a compaction is necessary to filter out any Rows from
>>> the parent region that are no longer applicable to the daughter region.
>>>
>>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>>> <[EMAIL PROTECTED]>  wrote:
>>>
>>>> Will major compactions take care of merging "older" regions or adding
>>>> more key/values to them as number of regions grow?
>>>>
>>>> Regard,
>>>> Srikanth
>>>>
>>>> -----Original Message-----
>>>> From: Amandeep Khurana [mailto:[EMAIL PROTECTED]]
>>>> Sent: Monday, November 21, 2011 7:25 AM
>>>> To: [EMAIL PROTECTED]
>>>> Subject: Re: Region Splits
>>>>
>>>> Mark,
>>>>
>>>> Yes, your understanding is correct. If your keys are sequential
>>>> (timestamps
>>>> etc), you will always be writing to the end of the table and "older"
>>>> regions will not get any writes. This is one of the arguments against
>>>> using
>>>> sequential keys.
>>>>
>>>> -ak
>>>>
>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<[EMAIL PROTECTED]>  wrote:
>>>>
>>>>> Say we have a use case that has sequential row keys and we have rows
>>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>>>>> split it will split at the halfway mark so there will be two regions as
>>>>> follows:
>>>>>
>>>>> Region1 [START-49]
>>>>> Region2 [50-END]
>>>>>
>>>>> So now at this point all inserts will be writing to Region2 only
>>>>> correct?
>>>>> Now at some point Region2 will need to split and it will look like the
>>>>> following before the split:
>>>>>
>>>>> Region1 [START-49]
>>>>> Region2 [50-150]
>>>>>
>>>>> After the split it will look like:
>>>>>
>>>>> Region1 [START-49]