Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Region Splits

Copy link to this message
Re: Region Splits
Nicolas Spiegelberg 2011-11-21, 05:39
Sequential writes are also an argument for pre-splitting and using hash
prefixing.  In other words, presplit your table into N regions instead of
the default of 1 & transform your keys into:

new_key = md5(old_key) + old_key

Using this method your sequential writes under the old_key are now spread
evenly across all regions.  There are some limitations to hash prefixing,
such as non-sequential scans across row boundaries.  However, it's a
tradeoff between even distribution & advanced query options.

On 11/20/11 7:54 PM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote:

>Yes, your understanding is correct. If your keys are sequential
>etc), you will always be writing to the end of the table and "older"
>regions will not get any writes. This is one of the arguments against
>sequential keys.
>On Sun, Nov 20, 2011 at 11:33 AM, Mark <[EMAIL PROTECTED]> wrote:
>> Say we have a use case that has sequential row keys and we have rows
>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>> split it will split at the halfway mark so there will be two regions as
>> follows:
>> Region1 [START-49]
>> Region2 [50-END]
>> So now at this point all inserts will be writing to Region2 only
>> Now at some point Region2 will need to split and it will look like the
>> following before the split:
>> Region1 [START-49]
>> Region2 [50-150]
>> After the split it will look like:
>> Region1 [START-49]
>> Region2 [50-100]
>> Region3 [150-END]
>> And this pattern will continue correct? My question is when there is a
>> case that has sequential keys how would any of the older regions every
>> receive anymore writes? It seems like they would always be stuck at
>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>> Thanks