Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Possibility of using timestamp as row key in HBase

Copy link to this message
Re: Possibility of using timestamp as row key in HBase
The new splitted region might be moved due to load balancing. Aren't you
experiencing the classic hot spotting? Only 1 RS getting all write traffic?
Just place a preceding byte before the time stamp and round robin each put
on values 1-num of region servers.

On Wednesday, June 19, 2013, yun peng wrote:

> Hi, All,
> Our use case requires to persist a stream into system like HBase. The
> stream data is in format of <timestamp, value>. In other word, timestamp is
> used as rowkey. We want to explore whether HBase is suitable for such kind
> of data.
> The problem is that the domain of row key (or timestamp) grow constantly.
> For example, given 3 nodes, n1 n2 n3, they are resp. hosting row key
> partition [0,4], [5, 9], [10,12]. Currently it is the last node n3 who is
> busy receiving upcoming writes (of row key 13 and 14). This continues until
> the region reaches max size 5 (that is, partition grows to [10,14]) and
> potentially splits.
> I am not expert on HBase split, but I am wondering after split, will the
> new writes still go to node n3 (for [10,14]) or the write stream can be
> intelligently redirected to other less busy node, like n1.
> In case HBase can't do things like this, how easy is it to extend HBase for
> such functionality? Thanks...
> Yun