Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Export / Import and table splits


Copy link to this message
-
Re: Export / Import and table splits
Michael Segel 2013-05-07, 22:34
Silly question...

If you're doing a simple export, then you end up with all of your prior regions as separate files in a directory, right?

So in theory, you could find the first row and the last complete row of each file and then do your pre-splits based on the start key and end key that you find.  

That would be your tool so to speak.

But to the point that reading back in these files will cause you to crash your RS and HBase?
That doesn't sound like its well tuned or right.

HTH
-Mike

On May 7, 2013, at 5:29 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> I am not aware of a tool which can pre-split table using another table's
> region boundaries as template.
>
> Such a tool would be nice to have.
>
> Cheers
>
> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
>> wrote:
>
>> Hi,
>>
>> When we are doing an export, we are only exporting the data. Then when
>> we are importing that back, we need to make sure the table is
>> pre-splitted correctly else we might hotspot some servers.
>>
>> If you simply export then import without pre-splitting at all, you
>> will most probably brought some servers down because they will be
>> overwhelmed with splits and compactions.
>>
>> Do we have any tool to pre-split a table the same way another table is
>> already pre-splitted?
>>
>> Something like
>>> duplicate 'source_table', 'target_table'
>>
>> Which will create a new table called 'target_table' with exactly the
>> same parameters as 'source_table' and the same regions boundaries?
>>
>> If we don't have, will it be useful to have one?
>>
>> Or event something like:
>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
>>
>>
>> JM
>>