Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Merge large number of regions


Copy link to this message
-
Re: Merge large number of regions
Kevin O'dell 2012-10-17, 14:39
Shrijeet,

  Here is a thread on doing a proper incremental with Import:

http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html
I am a fan of this one as it is well laid out.  Breaking this up for you
use case should be pretty easy.

CopyTable should work just as easily -
http://www.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/
If you follow the above it is really going to be a matter of preference.

On Tue, Oct 16, 2012 at 1:16 PM, Shrijeet Paliwal
<[EMAIL PROTECTED]>wrote:

> Hi Kevin,
>
> Thanks for answering. What are your thoughts on copyTable vs export-import
> considering my use case. Will one tool have lesser chance of copying
> inconsistent data over another?
>
> I wish to do increment copy of a live cluster to minimize downtime.
>
> On Tue, Oct 16, 2012 at 8:47 AM, Kevin O'dell <[EMAIL PROTECTED]
> >wrote:
>
> > Shrijeet,
> >
> >  I think a better approach would be a pre-split table and then do the
> > export/import.  This will save you from having to script the merges,
> which
> > can be end badly for META if done wrong.
> >
> > On Mon, Oct 15, 2012 at 5:31 PM, Shrijeet Paliwal
> > <[EMAIL PROTECTED]>wrote:
> >
> > > We moved to 0.92.2 some time ago and with that, increased the max file
> > size
> > > setting to 4GB (from 2GB). Also an application triggered cleanup
> > operation
> > > deleted lots of unwanted rows.
> > > These two combined have gotten us to a state where lots of regions are
> > > smaller than desired size.
> > >
> > > Merging regions two at a time seems time consuming and will be hard to
> > > automate. https://issues.apache.org/jira/browse/HBASE-1621 automates
> > > merging, but it is not stable.
> > >
> > > I am interested in knowing about other possible approaches folks have
> > > tried. What do you guys think about copyTable based approach ? (old
> > > ---copyTable---> new and then rename new to old)
> > >
> > > -Shrijeet
> > >
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
> >
>

--
Kevin O'Dell
Customer Operations Engineer, Cloudera