Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Export / Import and table splits


+
Jean-Marc Spaggiari 2013-05-07, 22:23
+
Mohammad Tariq 2013-05-07, 22:33
+
Ted Yu 2013-05-07, 22:29
+
Michael Segel 2013-05-07, 22:34
Copy link to this message
-
Re: Export / Import and table splits
@Mohammad: The end goal is really more regarding the splits more than
the model. So I don't think Lars' options are good for this usecase.
@Mike: I agree that things were not configured correctly. User should
have had split the table before doing the import. I like the idea of
looking at the files to get the regions boundaries. That way you don't
need to have the source_table still there...

So we have 2 different things here.
1) a command on the shell to duplicate a table structure
2) an option on the import command to split the table regions based on
the files names.

If we agree on that I will open one JIRA for each...

JM

2013/5/7 Michael Segel <[EMAIL PROTECTED]>:
> Silly question...
>
> If you're doing a simple export, then you end up with all of your prior regions as separate files in a directory, right?
>
> So in theory, you could find the first row and the last complete row of each file and then do your pre-splits based on the start key and end key that you find.
>
> That would be your tool so to speak.
>
> But to the point that reading back in these files will cause you to crash your RS and HBase?
> That doesn't sound like its well tuned or right.
>
> HTH
> -Mike
>
> On May 7, 2013, at 5:29 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> I am not aware of a tool which can pre-split table using another table's
>> region boundaries as template.
>>
>> Such a tool would be nice to have.
>>
>> Cheers
>>
>> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
>>> wrote:
>>
>>> Hi,
>>>
>>> When we are doing an export, we are only exporting the data. Then when
>>> we are importing that back, we need to make sure the table is
>>> pre-splitted correctly else we might hotspot some servers.
>>>
>>> If you simply export then import without pre-splitting at all, you
>>> will most probably brought some servers down because they will be
>>> overwhelmed with splits and compactions.
>>>
>>> Do we have any tool to pre-split a table the same way another table is
>>> already pre-splitted?
>>>
>>> Something like
>>>> duplicate 'source_table', 'target_table'
>>>
>>> Which will create a new table called 'target_table' with exactly the
>>> same parameters as 'source_table' and the same regions boundaries?
>>>
>>> If we don't have, will it be useful to have one?
>>>
>>> Or event something like:
>>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
>>>
>>>
>>> JM
>>>
>
+
Ted Yu 2013-05-07, 23:11
+
Ted Yu 2013-05-07, 23:18
+
Michael Segel 2013-05-07, 23:11
+
Jeremy Carroll 2013-05-08, 00:08
+
Jean-Marc Spaggiari 2013-05-14, 00:48
+
Matteo Bertozzi 2013-05-14, 00:54
+
Jean-Marc Spaggiari 2013-05-14, 00:57