Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Region move() and Data Locality


Copy link to this message
-
HBase Region move() and Data Locality
Hey all,

We are running on cdh3u2 (soon to upgrade to 3u3), and we notice that
regions are balanced solely based on the number of regions per region
server, with no regard for horizontal scaling of tables.  This was mostly
fine with a small number of regions, but as our cluster reaches thousands
of regions we are often finding an entire table (or large part of one) on a
single region server.  This seems suboptimal.

We were looking into options for this, and noticed that it is fixed in 0.94
(possibly 0.92?), but we are wanting to stick with CDH for now.  With that
mind, we needed alternatives, and found the HBaseAdmin move(byte[], byte[])
function<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#move(byte[],
byte[])>.  The documentation doesn't mention, but I'm wondering if using
this function ruins locality.  Without the locality problem, I was thinking
of creating a utility that allowed us to scramble the regions of a table
then called balance(), which would hopefully result in a better spread of
regions for a table.  However, I don't want to ruin our performance by
ruining the locality.

The HBase book mentions that locality is achieved through major
compactions.  If I have the opportunity to take some downtime, would it be
feasible to scramble all of the regions, run balance() to make sure all
regionservers have about the same number, then a major compaction to fix
locality?

Thanks!

Bryan