Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Export / Import and table splits


Copy link to this message
-
Re: Export / Import and table splits
I don't see much value in duplicating the table's structure, but IMHO, the jury is still out.
On May 7, 2013, at 6:02 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:

> @Mohammad: The end goal is really more regarding the splits more than
> the model. So I don't think Lars' options are good for this usecase.
> @Mike: I agree that things were not configured correctly. User should
> have had split the table before doing the import. I like the idea of
> looking at the files to get the regions boundaries. That way you don't
> need to have the source_table still there...
>
> So we have 2 different things here.
> 1) a command on the shell to duplicate a table structure
> 2) an option on the import command to split the table regions based on
> the files names.
>
> If we agree on that I will open one JIRA for each...
>
> JM
>
> 2013/5/7 Michael Segel <[EMAIL PROTECTED]>:
>> Silly question...
>>
>> If you're doing a simple export, then you end up with all of your prior regions as separate files in a directory, right?
>>
>> So in theory, you could find the first row and the last complete row of each file and then do your pre-splits based on the start key and end key that you find.
>>
>> That would be your tool so to speak.
>>
>> But to the point that reading back in these files will cause you to crash your RS and HBase?
>> That doesn't sound like its well tuned or right.
>>
>> HTH
>> -Mike
>>
>> On May 7, 2013, at 5:29 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>>> I am not aware of a tool which can pre-split table using another table's
>>> region boundaries as template.
>>>
>>> Such a tool would be nice to have.
>>>
>>> Cheers
>>>
>>> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
>>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> When we are doing an export, we are only exporting the data. Then when
>>>> we are importing that back, we need to make sure the table is
>>>> pre-splitted correctly else we might hotspot some servers.
>>>>
>>>> If you simply export then import without pre-splitting at all, you
>>>> will most probably brought some servers down because they will be
>>>> overwhelmed with splits and compactions.
>>>>
>>>> Do we have any tool to pre-split a table the same way another table is
>>>> already pre-splitted?
>>>>
>>>> Something like
>>>>> duplicate 'source_table', 'target_table'
>>>>
>>>> Which will create a new table called 'target_table' with exactly the
>>>> same parameters as 'source_table' and the same regions boundaries?
>>>>
>>>> If we don't have, will it be useful to have one?
>>>>
>>>> Or event something like:
>>>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
>>>>
>>>>
>>>> JM
>>>>
>>
>