You shouldn't have empty regions. Using timestamp will give you
regions that are always half filled except the last one to which you
are writing the current time range. The moment that'll fill up, split
and you'll again be writing to the last region. How did you end up
with empty regions? Did you pre-split?
On Jul 17, 2012, at 7:15 PM, Michael Segel <[EMAIL PROTECTED]> wrote:
> Find a different row key?
> The problem with merging regions is that once you merge the regions, any net new regions will still have the same problem. So you'll have to merge again, and again and again.
> You're always filling to the left of the last key.
> In order to merge, you have to take the table offline. At least that's my understanding. So its not a good thing.
> On Jul 17, 2012, at 11:08 AM, Ionut Ignatescu wrote:
>> My usecase: I have several tabels with key starting with a timestamp. Also,
>> this tabels have set data retention to 30 days.
>> Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
>> 5minute, ~200Mb is inserted).
>> File size is set to 1Gb. I have this tables in use for almost half an year
>> and now a table has around 6k partitions and 40% of them are empty.
>> The problem: the number of regions per region server is now pretty high.
>> Which approach is better?
>> - to merge adiacent empty partitions in a bigger one?
>> - to merge empty partitions to non-empty partitions?
>> Also, I'm wondering why regions merge is not part of major compactions and
>> why it's neccesary to stop the
>> entire fleet to solve this problem.
>> Ionut I.