Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Offline merge tool question

Copy link to this message
Re: Offline merge tool question
Thanks Stack.  We are going to test this on a test table in QA, but I'd
still like a fallback plan if something goes wrong when we eventually do it
in prod.

One idea I had was to snapshot the table, clone from the snapshot, and
perform the merge on the result of the clone.  I imagine I'd first want to
major compact the clone, so that we rewrite all of the linked files into
new files.  I also see at the end of this blog post (
that merging regions on a snapshot table can cause data loss.

Does my approach sound reasonable?  Disable table, snapshot table, create
clone from snapshot, major compact clone, run merge on clone, enable clone,
test, if fail fall-back to original table.
On Wed, Aug 14, 2013 at 1:32 AM, Stack <[EMAIL PROTECTED]> wrote:

> On Tue, Aug 13, 2013 at 5:17 PM, Bryan Beaudreault <
> > wrote:
> > I'm running cdh4.2 hbase 0.94.2, and am looking to merge some regions in
> a
> > table.  Looking at Merge.java, it seems to require that the entire
> cluster
> > be offline.  However, I also notice an HMerge.java which doesn't appear
> to
> > do the same validation.
> >
> > Two questions:
> >
> > 1) Why does Merge.java validate the entire cluster is down, as opposed to
> > just the single table being disabled?
> >
> >
> It is dumb/simple/old.
> > 2) Could I write my own tool that uses HMerge, so as to merge regions in
> > the disabled table without bringing the whole cluster down?
> >
> >
> Yes.  You can't do much harm if table is offline.
> St.Ack