Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> question about pre-splitting regions

Copy link to this message
question about pre-splitting regions

I am creating a new table and want to pre-split the regions and am seeing
some weird behavior.

My table is designed as a composite of multiple fixed length byte arrays
separated by a control character (for simplicity sake we can say the
separator is _underscore_). The prefix of this rowkey is deterministic
(i.e. length of 8 bytes) and I know it beforehand how many different prefix
I will see in the near future. The values after the prefix is not
deterministic. I wanted to create a pre-split tables based on the number of
number of prefix combinations that I know.

I ended up doing something like this:
hbaseAdmin.createTable(tableName, Bytes.toBytes(1L),
Bytes.toBytes(maxCombinationPrefixValue), maxCombinationPrefixValue)

The create table worked fine and as expected it created the number of
partitions. But when I write data to the table, I still see all the writes
hitting a single region instead of hitting different regions based on the
prefix. Is my thinking of splitting by prefix values flawed ? Do I have to
split by some real rowkeys (though it's impossible for me to know what
rowkeys will show up except the row prefix which is much more

For some reason I think I have a flawed understanding of the createTable
API and that is causing the issue for me ? Should I use the byte[][]
prefixes method and not the one that I am using right now ?

Any suggestions/pointers ?

Viral Bajaria 2013-02-15, 04:08