Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Custom versioning best practices

Copy link to this message
Custom versioning best practices

I was thinking of using versions with custom timestamps to store the
evolution of a column value - as opposed to creating several (time_t,
value_at_time_t) qualifier-value pairs. The value to be stored is a single
integer. Fast ad-hoc retrieval of multiple versions based on a row key +
filter [1] (i.e through a web service) is important, the number of row keys
will be between 10^6 and 10^9.

a) If the number of versions (timestamps) is moderate, can I expect
read/filtering performance to be better than when using multiple
qualifier/value pairs?
b) For a larger number of versions, say 365, what if any precautions should
I take with respect to the HBase/table setup.

I looked around a bit and found the following:

The documentation [2] mentions that the maximum number of versions should
not be too high ("in the hundreds"). The HBase o'Reilly book [3] on the
other hand mentions that Facebook use(d) versions to store inbox messages
in order. Clearly, the number of messages may grow quite large (>> 100). Is
[1] still valid with more recent versions of HBase?

Thank you,


[2] http://hbase.apache.org/book/schema.versions.html
[3] 1st edition, page 384
Michael Segel 2012-11-22, 16:11
David Koch 2012-11-22, 20:47
anil gupta 2012-11-22, 21:12