Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How does HBase perform load balancing?


Copy link to this message
-
Re: How does HBase perform load balancing?
The Yahoo! research link is the most recent one afaik... Thats the one
submitted to SOCC'10
On Sat, May 8, 2010 at 3:36 AM, Kevin Apte <[EMAIL PROTECTED]
> wrote:

> Are these the good links for the Yahoo Benchmarks?
> http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
> http://research.*yahoo*.com/files/ycsb.pdf
>
> Kevin
>
>
> On Sat, May 8, 2010 at 3:00 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote:
>
> > hey,
> >
> > HBase currently uses region count to load balance.  Regions are
> > assigned in a semi-randomish order to other regionservers.
> >
> > The paper is somewhat correct in that we are not moving data around
> > aggressively, because then people would write in complaining we move
> > data around too much :-)
> >
> > So a few notes, HBase is not a key-value store, its a tabluar data
> > store, which maintains key order, and allows the easy construction of
> > left-match key indexes.
> >
> > One other thing... if you are using a DHT (eg: cassandra), when a node
> > fails the load moves to the other servers in the ring-segment. For
> > example if you have N=3 and you lose a node in a segment, the load of
> > a server would move to 2 other servers.  Your monitoring system should
> > probably be tied into the DHT topology since if a second node fails in
> > the same ring you probably want to take action. Ironically nodes in
> > cassandra are special (unlike the publicly stated info) and they
> > "belong" to a particular ring segment and cannot be used to store
> > other data. There are tools to do node swap in, but you want your
> > cluster management to be as automated as possible.
> >
> > Compared to a bigtable architecture, the load of a failed regionserver
> > is evenly spread across the entire rest of the cluster.  No node has a
> > special role in HDFS and HBase, any data can be hosted and served from
> > any node.  As nodes fail, as long as you have enough nodes to serve
> > the load you are in good shape. The HDFS missing block report lets you
> > know when you have lost too many nodes. Nodes have no special role and
> > can host and hold any data.
> >
> > In the future we want to add a load balancing based on
> > requests/second.  We have all the requisite data and architecture, but
> > other things are up more important right now.  Pure region count load
> > balancing tends to work fairly well in practice.
> >
> > 2010/5/8 MauMau <[EMAIL PROTECTED]>:
> > > Hello,
> > >
> > > I got the following error when I sent the mail.
> > >
> > > Technical details of permanent failure:
> > > Google tried to deliver your message, but it was rejected by the
> > recipient
> > > domain. We recommend contacting the other email provider for further
> > > information about the cause of this error. The error that the other
> > server
> > > returned was: 552 552 spam score (5.2) exceeded threshold (state 18).
> > >
> > > The original mail might have been too long, so let me split it and send
> > > again.
> > >
> > >
> > > I'm comparing HBase and Cassandra, which I think are the most promising
> > > distributed key-value stores, to determine which one to choose for the
> > > future OLTP and data analysis.
> > > I found the following benchmark report by Yahoo! Research which
> evalutes
> > > HBase, Cassandra, PNUTS, and sharded MySQL.
> > >
> > > http://wiki.apache.org/hadoop/Hbase/DesignOverview
> > >
> > > The above report refers to HBase 0.20.3.
> > > Reading this and HBase's documentation, two questions about load
> > balancing
> > > and replication have risen. Could anyone give me any information to
> help
> > > solve these questions?
> > >
> > > [Q1] Load balancing
> > > Does HBase move regions to a newly added region server (logically, not
> > > physically on storage) immediately? If not immediately, what timing?
> > > On what criteria does the master unassign and assign regions among
> region
> > > servers? CPU load, read/write request rates, or just the number of
> > regions
> > > the region servers are handling?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB