-Re: Re-balancer on datanodes that run hbase regions servers
Harsh J 2013-02-17, 19:54
It is a bad idea only cause it will temporarily distort the perfect
locality of the regions hosted by each RegionServer. This gets
corrected only at the end of the next major compaction of all regions,
eventually, but both the events would cause some small level of
performance dips and increase in network use + I/O until done.
There's no way to escape the fact that if you write more HBase data,
the 3 nodes of RS are bound to fill up faster than the others, but
what we could do as an enhancement for aiding rebalancing the
remaining replica nodes without affecting the RS is to provide an
exclude-nodes feature to the balancer. By asking the Balancer to
exclude the RS's nodes, you can rebalance the rest of the cluster
while not causing a performance problem on the RS during the time.
Most clusters run the RS+DN pair across all nodes, so this scenario of
an imbalance won't really occur there.
I filed https://issues.apache.org/jira/browse/HDFS-4509 with some
ideas you could use (see comments).
On Sun, Dec 9, 2012 at 4:57 PM, Jabir Ahmed <[EMAIL PROTECTED]> wrote:
> Our cluster has around 12 data-nodes
> 9 nodes run datanodes + task trackers
> 3 nodes run dtanodes + regions servers
> 1 Namenode and Jotbtracker
> In this kind of a cluster setup is it advisable to run a re-blancer since
> running a balancer affects the performance of hbase.