Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region Splits

If you increase the region size to 2GB, then all regions (current and new)
will avoid a split until their aggregate StoreFile size reaches that
limit.  Reorganizing the regions for a uniform growth pattern is really a
schema design problem.  There is the capability to merge two adjacent
regions if you know that your data growth pattern is non-uniform.
StumbleUpon & other companies have more experience with those utilities
than I do.

Note: With the introduction of HFileV2 in 0.92, you'll definitely want to
lean towards increasing the region size.  HFile scalability code is more
mature/stable than the region splitting code.  Plus, automatic region
splitting is harder to optimize & debug when failures occur.

On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"

>Thanks Nicolas for the clarification.  I had a follow-up query.
>What will happen if we increased the region size, say from current value
>of 256 MB to a new value of 2GB?
>Will existing regions continue to use only 256 MB space?
>Is there a way to reorganize the regions so that each regions grows to
>2GB size?
>-----Original Message-----
>From: Nicolas Spiegelberg [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, November 22, 2011 10:59 PM
>Subject: Re: Region Splits
>No.  The purpose of major compactions is to merge & dedupe within a region
>boundary.  Compactions will not alter region boundaries, except in the
>case of splits where a compaction is necessary to filter out any Rows from
>the parent region that are no longer applicable to the daughter region.
>On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>>Will major compactions take care of merging "older" regions or adding
>>more key/values to them as number of regions grow?
>>-----Original Message-----
>>From: Amandeep Khurana [mailto:[EMAIL PROTECTED]]
>>Sent: Monday, November 21, 2011 7:25 AM
>>Subject: Re: Region Splits
>>Yes, your understanding is correct. If your keys are sequential
>>etc), you will always be writing to the end of the table and "older"
>>regions will not get any writes. This is one of the arguments against
>>sequential keys.
>>On Sun, Nov 20, 2011 at 11:33 AM, Mark <[EMAIL PROTECTED]> wrote:
>>> Say we have a use case that has sequential row keys and we have rows
>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>>> split it will split at the halfway mark so there will be two regions as
>>> follows:
>>> Region1 [START-49]
>>> Region2 [50-END]
>>> So now at this point all inserts will be writing to Region2 only
>>> Now at some point Region2 will need to split and it will look like the
>>> following before the split:
>>> Region1 [START-49]
>>> Region2 [50-150]
>>> After the split it will look like:
>>> Region1 [START-49]
>>> Region2 [50-100]
>>> Region3 [150-END]
>>> And this pattern will continue correct? My question is when there is a
>>> case that has sequential keys how would any of the older regions every
>>> receive anymore writes? It seems like they would always be stuck at
>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>> Thanks