Well with sequential data, you end up with your data being always added to the left of a region. So you’ll end up with your regions only 1/2 full after a split and then static.
When you say you’re creating 20 new regions… is that from the volume of data or are you still ‘pre-splitting’ the table?
Also if you increase the size of the regions, you’ll slow down on the number of regions being created.
How are you accessing your data?
You could bucket the data by prepending a byte from the hash of the row, but then you’d have a hard time doing a range scan unless you know your sequential id.
This is one use case that I envisioned when I talked about in HBASE-12853
It abstracts the bucketing… by doing it on the server side….
The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
michael_segel (AT) hotmail.com