Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - pre splitting tables

Copy link to this message
Re: pre splitting tables
Nicolas Spiegelberg 2011-10-24, 22:23
Isn't a better strategy to create the HBase keys as

Key = hash(MySQL_key) + MySQL_key

That way you'll know your key distribution and can add new machines
seamlessly.  I'm assuming that your rows don't overlap between any 2
machines.  If so, you could append the MACHINE_ID to the key (not
prepend).  I don't think you want the machine # as the first dimension on
your rows, because you want the data from new machines to be evenly spread
out across the existing regions.
On 10/24/11 9:07 AM, "Stack" <[EMAIL PROTECTED]> wrote:

>On Mon, Oct 24, 2011 at 1:27 AM, Sam Seigal <[EMAIL PROTECTED]> wrote:
>> According to the HBase book , pre splitting tables and doing manual
>> splits is a better long term strategy than letting HBase handle it.
>Its good for getting a table off the ground, yes.
>> Since I do not know what the keys from the prod system are going to
>> look like , I am adding a machine number prefix to the the row keys
>> and pre splitting the tables  based on the prefix (prefix 0 goes to
>> machine A, prefix 1 goes to machine b etc).
>You don't need to do inorder scan of the data?  Whats the rest of your
>row key look like?
>> Once I decide to add more machines, I can always do a rolling split
>> and add more prefixes.
>> Is this a good strategy for pre splitting the tables ?
>So, you'll start out with one region per server?
>What do you think the rate of splitting will be like?  Are you using
>default region size or have you bumped this up?