Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase Region/Table Hotspotting


Copy link to this message
-
Re: HBase Region/Table Hotspotting
Joarder KAMAL 2013-02-11, 05:56
Thanks Lars for explaining the reasons for hotspotting and key design
techniques.
Just wondering, is it possible to alter key design (e.g. from sequential
keys to salt keys) at run time in the production system? What are the
impacts?

To Ted,
Thanks a lot for point out at [HBASE-7667]. Interesting idea indeed. And
Matt Corgan explained the trade-offs between having fewer and more regions.
He also pointed out how a large number of regions can impact the compaction
process. Although I am an expert on HBase system, but what did you think
about how to find an optimal value of stripes or sub-region for each
region? Actually I didn't get the idea of having a fixed boundary stripes.

Thanks again.
HBase community is really great !!

Regards,
Joarder Kamal

On 11 February 2013 16:14, lars hofhansl <[EMAIL PROTECTED]> wrote:

> The most common cause for hotspotting is inserting rows with monotonically
> increasing row keys.
> In that case only the last region will get the writes and no amount of
> splitting will fix that (only one region serer will hold the last region of
> the table regardless of how small it is).
> There are ways around this. If you generate keys make sure they are not
> monotonically increasing. For example if you do not care about the sort
> order of the keys w.r.t. to each other you could reverse the bytes before
> you use them as row key. Another option is to prefix the key with a hash of
> the key (but then you loose the ability to do range scan across keys).
>
> If you still need to scan rows according to their sort order you can
> "salt" (as some call it) the key by prefix it with a limited number of
> random single digit (maybe 5-10 different numbers). Could also do a mod of
> the key. Each scan then has to issue multiple scans in parallel for each of
> the possible prefix numbers.
> (In fact that is a pretty effective way to avoid hotspotting and to
> parallelize your scans, but it needs some client side to reconcile the
> parallel scans).
>
> Another reason for hotspotting is inserting new versions a of small'ish
> set of row keys. In that case splitting might help, because it will
> increase the likelyhood of all those key falling into the same region.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Joarder KAMAL <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Sent: Sunday, February 10, 2013 6:17 PM
> Subject: HBase Region/Table Hotspotting
>
> This is my first email in the group. I am having a more general and
> open-ended question but hope to get some reasoning from the HBase user
> communities.
> I am a very basic HBase user and still learning. My intention to use HBase
> in one of our research project. Recently I was looking through Lars
> George's book "HBase - The Definitive Guide" and two particular topics
> caught my eyes. One is 'Region and Table Hotspotting' and the other is
> 'Region Auto-Sharding and Merging'.
>
> *Scenario: *
> If a hotspot is created in a particular region or in a table (having
> multiple regions) due to sudden workload change, then one may split the
> region into further small pieces and distributed it to a number of
> available physical machine in the cluster. This process should require
> large data transfer between different machines in the cluster and incur a
> performance cost. One may also change the 'key' definition and manage the
> regions. But I am not sure how effective or logical to change key designs
> on a production system.
>
> *Questions:*
>
>    1. How often you are facing Region or Table Hotspotting in HBase
>    production systems?
>    2. If a hotspot is created, how quickly it is automatically cleared out
>    (assuming sudden workload change)?
>    3. How often this kind of situation happens - A hotspot is detected and
>    vanished out before taking an action? or hotspots stays longer period of
>    time?
>    4. Or if the hotspot is stays, how it is handled (in general) in
>    production system?
>    5. How large data transfer cost is minimized or avoid for re-sharding