Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - hbase hashing algorithm and schema design


Copy link to this message
-
Re: hbase hashing algorithm and schema design
Joey Echeverria 2011-06-03, 12:27
Rows are split into regions of continuous row keys. Each region is assigned a physical server (region server) to host queries and updates to rows in that region. Currently, the assignment process is random and only balances the number of regions assigned to each server.

The problem with largely sequential key inserts is they will go to the region hosting the end of the key space. That makes this region server a potential bottleneck. If you want to improve write performance, you can prefix each key with a hash of the key. The downside is sequential scans now have to be performed with multiple scanners and re-ordered client side.

-Joey

On Jun 3, 2011, at 3:35, Sam Seigal <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am not able to find information regarding the algorithm that decides which
> region a particular row belongs to in an HBase cluster. Does the algorithm
> take into account the number of physical nodes ? Where can I find more
> details about it ?
>
> I went through the HBase book and the OpenTSDB schema examples on schema
> definitions and problems with monotonically increasing row keys, and had a
> follow up question.
>
> I want to be able to query on ranges of time in HBase. Following the
> OpenTSDB example, I have the following row key format:
>
> <eventid> - <yyyy-mm-dd>
>
> My eventId can be one of 12 distinct values (let us say from A-L) , and I
> have a 4 node cluster running HBase right now. However, these event id
> values are not evenly distributed.  I believe that this implies some of the
> regions in the cluster  are going to grow faster in size than others, and
> eventually will either automatically split or have to be manually split.
> Should this be a concern at this point ? How is HBase deciding which
> partition a particular key will go to ? I feel that knowing more details
> about the algorithm can help me design the schema better.
>
> Your help is appreciated.
>
> Thank you.
>
> Sam