Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Export / Import and table splits


Copy link to this message
-
Re: Export / Import and table splits
Ted Yu 2013-05-07, 23:11
Currently the Import tool doesn't create the table on target cluster, if we
choose approach #2, Import tool should be enhanced with table creation
capability.

Cheers

On Tue, May 7, 2013 at 4:02 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
> wrote:

> @Mohammad: The end goal is really more regarding the splits more than
> the model. So I don't think Lars' options are good for this usecase.
> @Mike: I agree that things were not configured correctly. User should
> have had split the table before doing the import. I like the idea of
> looking at the files to get the regions boundaries. That way you don't
> need to have the source_table still there...
>
> So we have 2 different things here.
> 1) a command on the shell to duplicate a table structure
> 2) an option on the import command to split the table regions based on
> the files names.
>
> If we agree on that I will open one JIRA for each...
>
> JM
>
> 2013/5/7 Michael Segel <[EMAIL PROTECTED]>:
> > Silly question...
> >
> > If you're doing a simple export, then you end up with all of your prior
> regions as separate files in a directory, right?
> >
> > So in theory, you could find the first row and the last complete row of
> each file and then do your pre-splits based on the start key and end key
> that you find.
> >
> > That would be your tool so to speak.
> >
> > But to the point that reading back in these files will cause you to
> crash your RS and HBase?
> > That doesn't sound like its well tuned or right.
> >
> > HTH
> > -Mike
> >
> > On May 7, 2013, at 5:29 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> >> I am not aware of a tool which can pre-split table using another table's
> >> region boundaries as template.
> >>
> >> Such a tool would be nice to have.
> >>
> >> Cheers
> >>
> >> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]
> >>> wrote:
> >>
> >>> Hi,
> >>>
> >>> When we are doing an export, we are only exporting the data. Then when
> >>> we are importing that back, we need to make sure the table is
> >>> pre-splitted correctly else we might hotspot some servers.
> >>>
> >>> If you simply export then import without pre-splitting at all, you
> >>> will most probably brought some servers down because they will be
> >>> overwhelmed with splits and compactions.
> >>>
> >>> Do we have any tool to pre-split a table the same way another table is
> >>> already pre-splitted?
> >>>
> >>> Something like
> >>>> duplicate 'source_table', 'target_table'
> >>>
> >>> Which will create a new table called 'target_table' with exactly the
> >>> same parameters as 'source_table' and the same regions boundaries?
> >>>
> >>> If we don't have, will it be useful to have one?
> >>>
> >>> Or event something like:
> >>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
> >>>
> >>>
> >>> JM
> >>>
> >
>