Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - default region splitting on which value?


Copy link to this message
-
Re: default region splitting on which value?
Ted Yu 2013-04-20, 20:07
How many column families do you have ?

For #3, per-splitting table at the row keys corresponding to peaks makes sense.

On Apr 20, 2013, at 10:52 AM, Pal Konyves <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am just reading about region splitting. By default - as I understand -
> Hbase handles splitting the regions. I just don't know how to imagine on
> which key it splits the regions.
>
> 1) For example when I write MD5 hash of rowkeys, they are most probably
> evenly distributed from
> 000000... to FFFFF... right? When  Hbase starts with one region, all the
> writes goes into that region, and when the HFile get's too big, it just
> gets for example the median value of the stored keys, and split the region
> by this?
>
> 2) I want to bulk load tons of data with the HBase java client API put
> operations. I want it to perform well. My keys are numeric sequential
> values (which I know from this post, I cannot load into Hbase sequentially,
> because the Hbase tables are going to be sad
> http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
> )
> So I thought I would pre-split the table into regions, and load the data
> randomized. This way I will get good distribution among region servers in
> terms of network IO from the beginning. Is that a good idea?
>
> 3) If my rowkeys are not evenly distributed in the keyspace, but they show
> some peaks or bursts. e.g. 000-999, but most of the keys gather around 020
> and 060 values, is it a good idea to have the pre region splits at those
> peaks?
>
> Thanks in advance,
> Pal