Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Export / Import and table splits


Copy link to this message
-
Re: Export / Import and table splits
I almost forgot: for 0.94.6.1 and newer releases, you can:

1. take a snapshot of the original table
2. export the snapshot to target cluster
3. clone the exported snapshot to a new table.

Cheers

On Tue, May 7, 2013 at 4:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Currently the Import tool doesn't create the table on target cluster, if
> we choose approach #2, Import tool should be enhanced with table creation
> capability.
>
> Cheers
>
>
> On Tue, May 7, 2013 at 4:02 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> @Mohammad: The end goal is really more regarding the splits more than
>> the model. So I don't think Lars' options are good for this usecase.
>> @Mike: I agree that things were not configured correctly. User should
>> have had split the table before doing the import. I like the idea of
>> looking at the files to get the regions boundaries. That way you don't
>> need to have the source_table still there...
>>
>> So we have 2 different things here.
>> 1) a command on the shell to duplicate a table structure
>> 2) an option on the import command to split the table regions based on
>> the files names.
>>
>> If we agree on that I will open one JIRA for each...
>>
>> JM
>>
>> 2013/5/7 Michael Segel <[EMAIL PROTECTED]>:
>> > Silly question...
>> >
>> > If you're doing a simple export, then you end up with all of your prior
>> regions as separate files in a directory, right?
>> >
>> > So in theory, you could find the first row and the last complete row of
>> each file and then do your pre-splits based on the start key and end key
>> that you find.
>> >
>> > That would be your tool so to speak.
>> >
>> > But to the point that reading back in these files will cause you to
>> crash your RS and HBase?
>> > That doesn't sound like its well tuned or right.
>> >
>> > HTH
>> > -Mike
>> >
>> > On May 7, 2013, at 5:29 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> >
>> >> I am not aware of a tool which can pre-split table using another
>> table's
>> >> region boundaries as template.
>> >>
>> >> Such a tool would be nice to have.
>> >>
>> >> Cheers
>> >>
>> >> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <
>> [EMAIL PROTECTED]
>> >>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> When we are doing an export, we are only exporting the data. Then when
>> >>> we are importing that back, we need to make sure the table is
>> >>> pre-splitted correctly else we might hotspot some servers.
>> >>>
>> >>> If you simply export then import without pre-splitting at all, you
>> >>> will most probably brought some servers down because they will be
>> >>> overwhelmed with splits and compactions.
>> >>>
>> >>> Do we have any tool to pre-split a table the same way another table is
>> >>> already pre-splitted?
>> >>>
>> >>> Something like
>> >>>> duplicate 'source_table', 'target_table'
>> >>>
>> >>> Which will create a new table called 'target_table' with exactly the
>> >>> same parameters as 'source_table' and the same regions boundaries?
>> >>>
>> >>> If we don't have, will it be useful to have one?
>> >>>
>> >>> Or event something like:
>> >>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
>> >>>
>> >>>
>> >>> JM
>> >>>
>> >
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB