Balancing regions between RS is correctly handled by HBase : I mean
that your RSs always manage the same number of regions (the balancer
takes care of it).
Unfortunately, balancing all the regions of one particular table
between the RS of your cluster is not always easy, since HBase (as
for 0.90.3) when it comes to splitting a region, create the new one
always on the same RS. This means that if you start with a 1 region
only table, and then you insert lots of data into it, new regions
will always be created to the same RS (if you insert is a M/R job,
you saturate this RS). Eventually, the balancer at a time will
decide to balance one of these regions to other RS, limiting the
issue, but it is not controllable.
Here at Capptain, we solved this problem by developing a special
Python script, based on the HBase shell, allowing to entirely
balance all the regions of all tables to all RS. It ensure that
regions of tables are uniformly deployed on all RS of the cluster,
with a minimum region transitions.
It is fast, and even if it can trigger a lot of region transitions,
there is very few impact at runtime and it can be run safely.
If you are interested, just let me know, I can share it.
Le 04/09/12 23:42, David Koch a ï¿½crit :
> Thank you for your replies. We are using CDH4 HBase 0.92. Good call on the
> web interface. The port is blocked so I never really got a chance to test
> it. As far as manual re-balancing is concerned I will check the book.
> On Tue, Sep 4, 2012 at 5:34 PM, Guillaume Gardey <
> [EMAIL PROTECTED]> wrote:
>>> a) What is the easiest way to get an overview of how a table is
>>> across regions of a cluster? I guess I could search .META. but I haven't
>>> figured out how to use filters from shell.
>>> b) What constitutes a "badly distributed" table and how can I re-balance
>>> c) Is b) needed at all? I know that HBase does its balancing
>>> behind the scenes.
>> I have found that
>> http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/ is a good
>> source of information/tools to look at regions balancing in the cluster and
>> investigate it.
>>> As for a) I tried running this script:
>>> like so:
>>> hbase org.jruby.Main ./list_regions.rb <_my_table>
>>> but I get
>>> ArgumentError: wrong number of arguments (1 for 2)
>>> (root) at ./list_regions.rb:60
>>> If someone more proficient notices an obvious fix, I'd be glad to hear
>>> about it.
>> Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that
>> this is a repository that is no longer maintained and was written for old
>> releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer
*Contact info *
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]%20>
Cell: +33 6 15 41 15 18 *Rennes Office *
Office: +33 2 99 65 69 13
10 rue Jean-Marie Duhamel
France *Paris Office *
Office: +33 1 84 06 13 85
Fax: +33 9 57 72 20 18
18 rue Tronchet
IMPORTANT NOTICE ï¿½ UBIKOD and CAPPTAIN are registered trademarks of
UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
email and attachments are confidential and may be subject to legal
privilege and/or protected by copyright. Copying or communicating
any part of it to others is prohibited and may be unlawful. If you
are not the intended recipient you must not use, copy, distribute or
rely on this email and should please return it immediately or notify
us by telephone. At present the integrity of email across the
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
accept liability for any claims arising as a result of the use of
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
S.A.R.L. may exercise any of its rights under relevant law, to
monitor the content of all electronic communications. You should
therefore be aware that this communication and any responses might
have been monitored, and may be accessed by UBIKOD S.A.R.L. The
views expressed in this document are that of the individual and may
not necessarily constitute or imply its endorsement or
recommendation by UBIKOD S.A.R.L. The content of this electronic
mail may be subject to the confidentiality terms of a
"Non-Disclosure Agreement" (NDA).