Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Region move() and Data Locality


Copy link to this message
-
Re: HBase Region move() and Data Locality
Thanks for the response!

We are currently migrating analytics data from an old mysql setup to a new
hbase-backed architecture.  We have a bunch of versions of the data running
at once, for testing, beta, live, etc, so we have 63 tables right now and
6451 regions hosted on 12 EC2 m1.xlarge servers.  As the tables grow at
various rates depending on where they are in historic processing, they tend
to grow on a single machine, as you mentioned.

I'll give your approach a shot.  Also open to other suggestions.

Thanks,

Bryan

On Mon, Mar 5, 2012 at 1:19 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> So what's going on on that cluster exactly? You have a lot of tables
> of various sizes and they tend to grow on only one machine?
>
> One simple trick to get good balancing for a table is to disable it,
> balance the cluster, then re-enable it. It will be distributed
> properly at that point.
>
> J-D
>
> On Mon, Mar 5, 2012 at 8:02 AM, Bryan Beaudreault
> <[EMAIL PROTECTED]> wrote:
> > Hey all,
> >
> > We are running on cdh3u2 (soon to upgrade to 3u3), and we notice that
> > regions are balanced solely based on the number of regions per region
> > server, with no regard for horizontal scaling of tables.  This was mostly
> > fine with a small number of regions, but as our cluster reaches thousands
> > of regions we are often finding an entire table (or large part of one)
> on a
> > single region server.  This seems suboptimal.
> >
> > We were looking into options for this, and noticed that it is fixed in
> 0.94
> > (possibly 0.92?), but we are wanting to stick with CDH for now.  With
> that
> > mind, we needed alternatives, and found the HBaseAdmin move(byte[],
> byte[])
> > function<
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#move(byte[]
> ,
> > byte[])>.  The documentation doesn't mention, but I'm wondering if using
> > this function ruins locality.  Without the locality problem, I was
> thinking
> > of creating a utility that allowed us to scramble the regions of a table
> > then called balance(), which would hopefully result in a better spread of
> > regions for a table.  However, I don't want to ruin our performance by
> > ruining the locality.
> >
> > The HBase book mentions that locality is achieved through major
> > compactions.  If I have the opportunity to take some downtime, would it
> be
> > feasible to scramble all of the regions, run balance() to make sure all
> > regionservers have about the same number, then a major compaction to fix
> > locality?
> >
> > Thanks!
> >
> > Bryan
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB