Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> pre splitting tables


Copy link to this message
-
Re: pre splitting tables
Isn't a better strategy to create the HBase keys as

Key = hash(MySQL_key) + MySQL_key

That way you'll know your key distribution and can add new machines
seamlessly.  I'm assuming that your rows don't overlap between any 2
machines.  If so, you could append the MACHINE_ID to the key (not
prepend).  I don't think you want the machine # as the first dimension on
your rows, because you want the data from new machines to be evenly spread
out across the existing regions.
On 10/24/11 9:07 AM, "Stack" <[EMAIL PROTECTED]> wrote:

>On Mon, Oct 24, 2011 at 1:27 AM, Sam Seigal <[EMAIL PROTECTED]> wrote:
>> According to the HBase book , pre splitting tables and doing manual
>> splits is a better long term strategy than letting HBase handle it.
>>
>
>Its good for getting a table off the ground, yes.
>
>
>> Since I do not know what the keys from the prod system are going to
>> look like , I am adding a machine number prefix to the the row keys
>> and pre splitting the tables  based on the prefix (prefix 0 goes to
>> machine A, prefix 1 goes to machine b etc).
>>
>
>You don't need to do inorder scan of the data?  Whats the rest of your
>row key look like?
>
>
>> Once I decide to add more machines, I can always do a rolling split
>> and add more prefixes.
>>
>
>Yes.
>
>> Is this a good strategy for pre splitting the tables ?
>>
>
>So, you'll start out with one region per server?
>
>What do you think the rate of splitting will be like?  Are you using
>default region size or have you bumped this up?
>
>St.Ack
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB