|
|
-
Custom versioning best practicesDavid Koch 2012-11-22, 13:55
Hello,
I was thinking of using versions with custom timestamps to store the evolution of a column value - as opposed to creating several (time_t, value_at_time_t) qualifier-value pairs. The value to be stored is a single integer. Fast ad-hoc retrieval of multiple versions based on a row key + filter [1] (i.e through a web service) is important, the number of row keys will be between 10^6 and 10^9. a) If the number of versions (timestamps) is moderate, can I expect read/filtering performance to be better than when using multiple qualifier/value pairs? b) For a larger number of versions, say 365, what if any precautions should I take with respect to the HBase/table setup. I looked around a bit and found the following: The documentation [2] mentions that the maximum number of versions should not be too high ("in the hundreds"). The HBase o'Reilly book [3] on the other hand mentions that Facebook use(d) versions to store inbox messages in order. Clearly, the number of messages may grow quite large (>> 100). Is [1] still valid with more recent versions of HBase? Thank you, /David [1] http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/TimestampsFilter.html [2] http://hbase.apache.org/book/schema.versions.html [3] 1st edition, page 384 |