Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Export / Import and table splits


Copy link to this message
-
Re: Export / Import and table splits
I'll go with the snapshots since you can avoid all the I/O of the
import/export but the consistency model is different, and you don't have
the start/end time option... you should delete the rows < tstart and > tend
after the clone

Matteo

On Tue, May 14, 2013 at 1:48 AM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Hi Jeremy,
>
> Thanks for sharing this.
>
> I will take a look at it, and also most probably give a try to the snapshot
> option....
>
> JM
>
> 2013/5/7 Jeremy Carroll <[EMAIL PROTECTED]>
>
> >
> >
> https://github.com/phobos182/hadoop-hbase-tools/blob/master/hbase/copy_table.rb
> >
> > I wrote a quick script to do it with mechanize + ruby. I have a new tool
> > which I'm polishing up that does the same thing in Python but using the
> > HBase REST interface to get the data.
> >
> >
> > On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > Hi,
> > >
> > > When we are doing an export, we are only exporting the data. Then when
> > > we are importing that back, we need to make sure the table is
> > > pre-splitted correctly else we might hotspot some servers.
> > >
> > > If you simply export then import without pre-splitting at all, you
> > > will most probably brought some servers down because they will be
> > > overwhelmed with splits and compactions.
> > >
> > > Do we have any tool to pre-split a table the same way another table is
> > > already pre-splitted?
> > >
> > > Something like
> > > > duplicate 'source_table', 'target_table'
> > >
> > > Which will create a new table called 'target_table' with exactly the
> > > same parameters as 'source_table' and the same regions boundaries?
> > >
> > > If we don't have, will it be useful to have one?
> > >
> > > Or event something like:
> > > > create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
> > >
> > >
> > > JM
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB