Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Fixing badly distributed table manually.


Copy link to this message
-
RE: Fixing badly distributed table manually.
> a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster?

I usually see by the web interface (host:60010).
Click on a table and scroll down. There will be a region count of this table across the cluster.

> b) What constitutes a "badly distributed" table and how can I re-balance manually?

I think the answer to this questions is manually split. There is a chapter in the book talking about it.
I am looking forward for an answer from the experienced guys ;)

> c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes.

>From my experience yes. HBase does not balance as much as I need. In the worst case I have
a difference of 16 regions (32 against 48) in a 10 machine cluster.

Hoping for a great answer so I don't have to do manual splits ;)

Regards,
Pablo

-----Original Message-----
From: David Koch [mailto:[EMAIL PROTECTED]]
Sent: terça-feira, 4 de setembro de 2012 11:56
To: [EMAIL PROTECTED]
Subject: Fixing badly distributed table manually.

Hello,

A couple of questions regarding balancing of a table's data in HBase.

a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I guess I could search .META. but I haven't figured out how to use filters from shell.
b) What constitutes a "badly distributed" table and how can I re-balance manually?
c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes.

As for a) I tried running this script:

https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb

like so:

hbase org.jruby.Main ./list_regions.rb <_my_table>

but I get

ArgumentError: wrong number of arguments (1 for 2)
  (root) at ./list_regions.rb:60

If someone more proficient notices an obvious fix, I'd be glad to hear about it.

Why do I ask? I have the impression that one of the tables on our HBase cluster is not well distributed. When running a Map Reduce job on this table, the load average on a single node is very high, whereas all other nodes are almost idling. It is the only table where this behavior is observed. Other Map Reduce jobs result in slightly elevated load averages on several machines.

Thank you,

/David