Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Merge large number of regions


Copy link to this message
-
Re: Merge large number of regions
Thanks Kevin! Very useful pointers.

On Wed, Oct 17, 2012 at 7:39 AM, Kevin O'dell <[EMAIL PROTECTED]>wrote:

> Shrijeet,
>
>   Here is a thread on doing a proper incremental with Import:
>
>
> http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html
> I am a fan of this one as it is well laid out.  Breaking this up for you
> use case should be pretty easy.
>
> CopyTable should work just as easily -
> http://www.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/
>
>
> If you follow the above it is really going to be a matter of preference.
>
> On Tue, Oct 16, 2012 at 1:16 PM, Shrijeet Paliwal
> <[EMAIL PROTECTED]>wrote:
>
> > Hi Kevin,
> >
> > Thanks for answering. What are your thoughts on copyTable vs
> export-import
> > considering my use case. Will one tool have lesser chance of copying
> > inconsistent data over another?
> >
> > I wish to do increment copy of a live cluster to minimize downtime.
> >
> > On Tue, Oct 16, 2012 at 8:47 AM, Kevin O'dell <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Shrijeet,
> > >
> > >  I think a better approach would be a pre-split table and then do the
> > > export/import.  This will save you from having to script the merges,
> > which
> > > can be end badly for META if done wrong.
> > >
> > > On Mon, Oct 15, 2012 at 5:31 PM, Shrijeet Paliwal
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > We moved to 0.92.2 some time ago and with that, increased the max
> file
> > > size
> > > > setting to 4GB (from 2GB). Also an application triggered cleanup
> > > operation
> > > > deleted lots of unwanted rows.
> > > > These two combined have gotten us to a state where lots of regions
> are
> > > > smaller than desired size.
> > > >
> > > > Merging regions two at a time seems time consuming and will be hard
> to
> > > > automate. https://issues.apache.org/jira/browse/HBASE-1621 automates
> > > > merging, but it is not stable.
> > > >
> > > > I am interested in knowing about other possible approaches folks have
> > > > tried. What do you guys think about copyTable based approach ? (old
> > > > ---copyTable---> new and then rename new to old)
> > > >
> > > > -Shrijeet
> > > >
> > >
> > >
> > >
> > > --
> > > Kevin O'Dell
> > > Customer Operations Engineer, Cloudera
> > >
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB