I ended up writing a tool which helps merge the table regions into a target
# of regions. For example if you want to go from N --> N/8, then the tool
figures out the grouping and merges them in one pass. I will put it up in a
github repo soon and share it here.
The sad part of this approach is the downtime required. It's taking over 2
hours on my test cluster which is less than 30% of the production table
size. In absolute value, the table has over 100 regions and I am merging it
down to 20 or so and it has 20GB of compressed (lzo) data.
Is there a better way to achieve this ? If not, should I open a JIRA to
explore the chance of running the Merge util on a disabled table rather
than having to shutdown the entire cluster ? It will also be great to
ignore compaction when merging the table and then do it as a later step
since that can happen online. Just throwing some ideas here.
On Tue, Jul 2, 2013 at 11:22 AM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:
> Hi Viral,
> It was working fine when I did it. I'm not sure you can still apply it
> to a recent HBase version because some code change. But I can take a
> look to see if I can rebase it...