Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Possibility of using timestamp as row key in HBase


Copy link to this message
-
Possibility of using timestamp as row key in HBase
yun peng 2013-06-19, 20:04
Hi, All,
Our use case requires to persist a stream into system like HBase. The
stream data is in format of <timestamp, value>. In other word, timestamp is
used as rowkey. We want to explore whether HBase is suitable for such kind
of data.

The problem is that the domain of row key (or timestamp) grow constantly.
For example, given 3 nodes, n1 n2 n3, they are resp. hosting row key
partition [0,4], [5, 9], [10,12]. Currently it is the last node n3 who is
busy receiving upcoming writes (of row key 13 and 14). This continues until
the region reaches max size 5 (that is, partition grows to [10,14]) and
potentially splits.

I am not expert on HBase split, but I am wondering after split, will the
new writes still go to node n3 (for [10,14]) or the write stream can be
intelligently redirected to other less busy node, like n1.

In case HBase can't do things like this, how easy is it to extend HBase for
such functionality? Thanks...
Yun