Also, depending on your version of HBase that you are running you may have
to bring down the cluster to merge and not just the table:
On Tue, Jul 17, 2012 at 7:26 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> You shouldn't have empty regions. Using timestamp will give you
> regions that are always half filled except the last one to which you
> are writing the current time range. The moment that'll fill up, split
> and you'll again be writing to the last region. How did you end up
> with empty regions? Did you pre-split?
> On Jul 17, 2012, at 7:15 PM, Michael Segel <[EMAIL PROTECTED]>
> > Find a different row key?
> > The problem with merging regions is that once you merge the regions, any
> net new regions will still have the same problem. So you'll have to merge
> again, and again and again.
> > You're always filling to the left of the last key.
> > In order to merge, you have to take the table offline. At least that's
> my understanding. So its not a good thing.
> > On Jul 17, 2012, at 11:08 AM, Ionut Ignatescu wrote:
> >> My usecase: I have several tabels with key starting with a timestamp.
> >> this tabels have set data retention to 30 days.
> >> Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
> >> 5minute, ~200Mb is inserted).
> >> File size is set to 1Gb. I have this tables in use for almost half an
> >> and now a table has around 6k partitions and 40% of them are empty.
> >> The problem: the number of regions per region server is now pretty high.
> >> Questions:
> >> Which approach is better?
> >> - to merge adiacent empty partitions in a bigger one?
> >> - to merge empty partitions to non-empty partitions?
> >> Also, I'm wondering why regions merge is not part of major compactions
> >> why it's neccesary to stop the
> >> entire fleet to solve this problem.
> >> Regards,
> >> Ionut I.
Customer Operations Engineer, Cloudera