Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Export / Import and table splits


Copy link to this message
-
Re: Export / Import and table splits
I almost forgot: for 0.94.6.1 and newer releases, you can:

1. take a snapshot of the original table
2. export the snapshot to target cluster
3. clone the exported snapshot to a new table.

Cheers

On Tue, May 7, 2013 at 4:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Currently the Import tool doesn't create the table on target cluster, if
> we choose approach #2, Import tool should be enhanced with table creation
> capability.
>
> Cheers
>
>
> On Tue, May 7, 2013 at 4:02 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> @Mohammad: The end goal is really more regarding the splits more than
>> the model. So I don't think Lars' options are good for this usecase.
>> @Mike: I agree that things were not configured correctly. User should
>> have had split the table before doing the import. I like the idea of
>> looking at the files to get the regions boundaries. That way you don't
>> need to have the source_table still there...
>>
>> So we have 2 different things here.
>> 1) a command on the shell to duplicate a table structure
>> 2) an option on the import command to split the table regions based on
>> the files names.
>>
>> If we agree on that I will open one JIRA for each...
>>
>> JM
>>
>> 2013/5/7 Michael Segel <[EMAIL PROTECTED]>:
>> > Silly question...
>> >
>> > If you're doing a simple export, then you end up with all of your prior
>> regions as separate files in a directory, right?
>> >
>> > So in theory, you could find the first row and the last complete row of
>> each file and then do your pre-splits based on the start key and end key
>> that you find.
>> >
>> > That would be your tool so to speak.
>> >
>> > But to the point that reading back in these files will cause you to
>> crash your RS and HBase?
>> > That doesn't sound like its well tuned or right.
>> >
>> > HTH
>> > -Mike
>> >
>> > On May 7, 2013, at 5:29 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> >
>> >> I am not aware of a tool which can pre-split table using another
>> table's
>> >> region boundaries as template.
>> >>
>> >> Such a tool would be nice to have.
>> >>
>> >> Cheers
>> >>
>> >> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <
>> [EMAIL PROTECTED]
>> >>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> When we are doing an export, we are only exporting the data. Then when
>> >>> we are importing that back, we need to make sure the table is
>> >>> pre-splitted correctly else we might hotspot some servers.
>> >>>
>> >>> If you simply export then import without pre-splitting at all, you
>> >>> will most probably brought some servers down because they will be
>> >>> overwhelmed with splits and compactions.
>> >>>
>> >>> Do we have any tool to pre-split a table the same way another table is
>> >>> already pre-splitted?
>> >>>
>> >>> Something like
>> >>>> duplicate 'source_table', 'target_table'
>> >>>
>> >>> Which will create a new table called 'target_table' with exactly the
>> >>> same parameters as 'source_table' and the same regions boundaries?
>> >>>
>> >>> If we don't have, will it be useful to have one?
>> >>>
>> >>> Or event something like:
>> >>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
>> >>>
>> >>>
>> >>> JM
>> >>>
>> >
>>
>
>