Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Merge large number of regions


Copy link to this message
-
Re: Merge large number of regions
Shrijeet,

  Here is a thread on doing a proper incremental with Import:

http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html
I am a fan of this one as it is well laid out.  Breaking this up for you
use case should be pretty easy.

CopyTable should work just as easily -
http://www.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/
If you follow the above it is really going to be a matter of preference.

On Tue, Oct 16, 2012 at 1:16 PM, Shrijeet Paliwal
<[EMAIL PROTECTED]>wrote:

> Hi Kevin,
>
> Thanks for answering. What are your thoughts on copyTable vs export-import
> considering my use case. Will one tool have lesser chance of copying
> inconsistent data over another?
>
> I wish to do increment copy of a live cluster to minimize downtime.
>
> On Tue, Oct 16, 2012 at 8:47 AM, Kevin O'dell <[EMAIL PROTECTED]
> >wrote:
>
> > Shrijeet,
> >
> >  I think a better approach would be a pre-split table and then do the
> > export/import.  This will save you from having to script the merges,
> which
> > can be end badly for META if done wrong.
> >
> > On Mon, Oct 15, 2012 at 5:31 PM, Shrijeet Paliwal
> > <[EMAIL PROTECTED]>wrote:
> >
> > > We moved to 0.92.2 some time ago and with that, increased the max file
> > size
> > > setting to 4GB (from 2GB). Also an application triggered cleanup
> > operation
> > > deleted lots of unwanted rows.
> > > These two combined have gotten us to a state where lots of regions are
> > > smaller than desired size.
> > >
> > > Merging regions two at a time seems time consuming and will be hard to
> > > automate. https://issues.apache.org/jira/browse/HBASE-1621 automates
> > > merging, but it is not stable.
> > >
> > > I am interested in knowing about other possible approaches folks have
> > > tried. What do you guys think about copyTable based approach ? (old
> > > ---copyTable---> new and then rename new to old)
> > >
> > > -Shrijeet
> > >
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
> >
>

--
Kevin O'Dell
Customer Operations Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB