Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Custom versioning best practices


Copy link to this message
-
Custom versioning best practices
Hello,

I was thinking of using versions with custom timestamps to store the
evolution of a column value - as opposed to creating several (time_t,
value_at_time_t) qualifier-value pairs. The value to be stored is a single
integer. Fast ad-hoc retrieval of multiple versions based on a row key +
filter [1] (i.e through a web service) is important, the number of row keys
will be between 10^6 and 10^9.

a) If the number of versions (timestamps) is moderate, can I expect
read/filtering performance to be better than when using multiple
qualifier/value pairs?
b) For a larger number of versions, say 365, what if any precautions should
I take with respect to the HBase/table setup.

I looked around a bit and found the following:

The documentation [2] mentions that the maximum number of versions should
not be too high ("in the hundreds"). The HBase o'Reilly book [3] on the
other hand mentions that Facebook use(d) versions to store inbox messages
in order. Clearly, the number of messages may grow quite large (>> 100). Is
[1] still valid with more recent versions of HBase?

Thank you,

/David

[1]
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/TimestampsFilter.html
[2] http://hbase.apache.org/book/schema.versions.html
[3] 1st edition, page 384
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB