Here is a thread on doing a proper incremental with Import:
I am a fan of this one as it is well laid out. Breaking this up for you
use case should be pretty easy.
CopyTable should work just as easily -
If you follow the above it is really going to be a matter of preference.
On Tue, Oct 16, 2012 at 1:16 PM, Shrijeet Paliwal
> Hi Kevin,
> Thanks for answering. What are your thoughts on copyTable vs export-import
> considering my use case. Will one tool have lesser chance of copying
> inconsistent data over another?
> I wish to do increment copy of a live cluster to minimize downtime.
> On Tue, Oct 16, 2012 at 8:47 AM, Kevin O'dell <[EMAIL PROTECTED]
> > Shrijeet,
> > I think a better approach would be a pre-split table and then do the
> > export/import. This will save you from having to script the merges,
> > can be end badly for META if done wrong.
> > On Mon, Oct 15, 2012 at 5:31 PM, Shrijeet Paliwal
> > <[EMAIL PROTECTED]>wrote:
> > > We moved to 0.92.2 some time ago and with that, increased the max file
> > size
> > > setting to 4GB (from 2GB). Also an application triggered cleanup
> > operation
> > > deleted lots of unwanted rows.
> > > These two combined have gotten us to a state where lots of regions are
> > > smaller than desired size.
> > >
> > > Merging regions two at a time seems time consuming and will be hard to
> > > automate. https://issues.apache.org/jira/browse/HBASE-1621 automates
> > > merging, but it is not stable.
> > >
> > > I am interested in knowing about other possible approaches folks have
> > > tried. What do you guys think about copyTable based approach ? (old
> > > ---copyTable---> new and then rename new to old)
> > >
> > > -Shrijeet
> > >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
Customer Operations Engineer, Cloudera