-Re: HBase Region/Table Hotspotting
Ted Yu 2013-02-11, 14:55
Sub-region management is in experimental stage.
We will get better idea when HBASE-7667 gets in-depth review and more
cluster-level testing is done.
You can watch HBASE-7667 so that you get updates.
On Sun, Feb 10, 2013 at 9:56 PM, Joarder KAMAL <[EMAIL PROTECTED]> wrote:
> Thanks Lars for explaining the reasons for hotspotting and key design
> Just wondering, is it possible to alter key design (e.g. from sequential
> keys to salt keys) at run time in the production system? What are the
> To Ted,
> Thanks a lot for point out at [HBASE-7667]. Interesting idea indeed. And
> Matt Corgan explained the trade-offs between having fewer and more regions.
> He also pointed out how a large number of regions can impact the compaction
> process. Although I am an expert on HBase system, but what did you think
> about how to find an optimal value of stripes or sub-region for each
> region? Actually I didn't get the idea of having a fixed boundary stripes.
> Thanks again.
> HBase community is really great !!
> Joarder Kamal
> On 11 February 2013 16:14, lars hofhansl <[EMAIL PROTECTED]> wrote:
> > The most common cause for hotspotting is inserting rows with
> > increasing row keys.
> > In that case only the last region will get the writes and no amount of
> > splitting will fix that (only one region serer will hold the last region
> > the table regardless of how small it is).
> > There are ways around this. If you generate keys make sure they are not
> > monotonically increasing. For example if you do not care about the sort
> > order of the keys w.r.t. to each other you could reverse the bytes before
> > you use them as row key. Another option is to prefix the key with a hash
> > the key (but then you loose the ability to do range scan across keys).
> > If you still need to scan rows according to their sort order you can
> > "salt" (as some call it) the key by prefix it with a limited number of
> > random single digit (maybe 5-10 different numbers). Could also do a mod
> > the key. Each scan then has to issue multiple scans in parallel for each
> > the possible prefix numbers.
> > (In fact that is a pretty effective way to avoid hotspotting and to
> > parallelize your scans, but it needs some client side to reconcile the
> > parallel scans).
> > Another reason for hotspotting is inserting new versions a of small'ish
> > set of row keys. In that case splitting might help, because it will
> > increase the likelyhood of all those key falling into the same region.
> > -- Lars
> > ________________________________
> > From: Joarder KAMAL <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> > Sent: Sunday, February 10, 2013 6:17 PM
> > Subject: HBase Region/Table Hotspotting
> > This is my first email in the group. I am having a more general and
> > open-ended question but hope to get some reasoning from the HBase user
> > communities.
> > I am a very basic HBase user and still learning. My intention to use
> > in one of our research project. Recently I was looking through Lars
> > George's book "HBase - The Definitive Guide" and two particular topics
> > caught my eyes. One is 'Region and Table Hotspotting' and the other is
> > 'Region Auto-Sharding and Merging'.
> > *Scenario: *
> > If a hotspot is created in a particular region or in a table (having
> > multiple regions) due to sudden workload change, then one may split the
> > region into further small pieces and distributed it to a number of
> > available physical machine in the cluster. This process should require
> > large data transfer between different machines in the cluster and incur a
> > performance cost. One may also change the 'key' definition and manage the
> > regions. But I am not sure how effective or logical to change key designs
> > on a production system.
> > *Questions:*
> > 1. How often you are facing Region or Table Hotspotting in HBase