Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Pre splitting the HBase Table for specific row key design


Copy link to this message
-
Re: Pre splitting the HBase Table for specific row key design
Hi Hari,

Can you please provide more details on the the challenge that you are
facing?

You can pre-split using the Java Client Api, the HBase shell or even with
the WebUI.

For the shell, you can do something like this: create 'transactions', 'f1',
{NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
JM
2013/12/27 Hari Krishna <[EMAIL PROTECTED]>

> Hi,
>
> We are planning to migrate form CDH3 cluster to CDH4 cluster and as part of
> migration we are also planning to use HBase instead of Hive ware house that
> we are using in CDH3 cluster. Daily we are bringing the data from oracle to
> hadoop using sqooping and we are having 10 different data base schema from
> where we are bringing.
>
> In hive ware house we have maintained a table with schema name as higher
> level partition and date as other partition in side schema partition. Every
> day the  data for the table will be kept on date partition.
>
> In HBase we have designed a table to have a row key as combination of (byte
> array value of Bucket Number(value ranges from 0 to 15, so total of 16
> buckets we are maintaining), MD5(of schema), MD5(date), byte array value of
> pkid). It is working as expected, we are able to retrieve the data based on
> schema and date wise, which is our key use case. Here each bucket having a
> key of ranges 0 to long max.
>
> Now we are having a challenge in pre-splitting the table (lets say table
> name as transactions). Can any one help me on this.
>
> Regards,
> GHK.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB