-Re: How does HBase perform load balancing?
Amandeep Khurana 2010-05-08, 21:48
The Yahoo! research link is the most recent one afaik... Thats the one
submitted to SOCC'10
On Sat, May 8, 2010 at 3:36 AM, Kevin Apte <[EMAIL PROTECTED]
> Are these the good links for the Yahoo Benchmarks?
> On Sat, May 8, 2010 at 3:00 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote:
> > hey,
> > HBase currently uses region count to load balance. Regions are
> > assigned in a semi-randomish order to other regionservers.
> > The paper is somewhat correct in that we are not moving data around
> > aggressively, because then people would write in complaining we move
> > data around too much :-)
> > So a few notes, HBase is not a key-value store, its a tabluar data
> > store, which maintains key order, and allows the easy construction of
> > left-match key indexes.
> > One other thing... if you are using a DHT (eg: cassandra), when a node
> > fails the load moves to the other servers in the ring-segment. For
> > example if you have N=3 and you lose a node in a segment, the load of
> > a server would move to 2 other servers. Your monitoring system should
> > probably be tied into the DHT topology since if a second node fails in
> > the same ring you probably want to take action. Ironically nodes in
> > cassandra are special (unlike the publicly stated info) and they
> > "belong" to a particular ring segment and cannot be used to store
> > other data. There are tools to do node swap in, but you want your
> > cluster management to be as automated as possible.
> > Compared to a bigtable architecture, the load of a failed regionserver
> > is evenly spread across the entire rest of the cluster. No node has a
> > special role in HDFS and HBase, any data can be hosted and served from
> > any node. As nodes fail, as long as you have enough nodes to serve
> > the load you are in good shape. The HDFS missing block report lets you
> > know when you have lost too many nodes. Nodes have no special role and
> > can host and hold any data.
> > In the future we want to add a load balancing based on
> > requests/second. We have all the requisite data and architecture, but
> > other things are up more important right now. Pure region count load
> > balancing tends to work fairly well in practice.
> > 2010/5/8 MauMau <[EMAIL PROTECTED]>:
> > > Hello,
> > >
> > > I got the following error when I sent the mail.
> > >
> > > Technical details of permanent failure:
> > > Google tried to deliver your message, but it was rejected by the
> > recipient
> > > domain. We recommend contacting the other email provider for further
> > > information about the cause of this error. The error that the other
> > server
> > > returned was: 552 552 spam score (5.2) exceeded threshold (state 18).
> > >
> > > The original mail might have been too long, so let me split it and send
> > > again.
> > >
> > >
> > > I'm comparing HBase and Cassandra, which I think are the most promising
> > > distributed key-value stores, to determine which one to choose for the
> > > future OLTP and data analysis.
> > > I found the following benchmark report by Yahoo! Research which
> > > HBase, Cassandra, PNUTS, and sharded MySQL.
> > >
> > > http://wiki.apache.org/hadoop/Hbase/DesignOverview
> > >
> > > The above report refers to HBase 0.20.3.
> > > Reading this and HBase's documentation, two questions about load
> > balancing
> > > and replication have risen. Could anyone give me any information to
> > > solve these questions?
> > >
> > > [Q1] Load balancing
> > > Does HBase move regions to a newly added region server (logically, not
> > > physically on storage) immediately? If not immediately, what timing?
> > > On what criteria does the master unassign and assign regions among
> > > servers? CPU load, read/write request rates, or just the number of
> > regions
> > > the region servers are handling?