Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Custom versioning best practices


+
David Koch 2012-11-22, 13:55
Copy link to this message
-
Re: Custom versioning best practices
Michael Segel 2012-11-22, 16:11
IMHO, the best practice is not to do this.

Its an abuse of versioning and if you really want to store temporal data, make it part of the column name.
On Nov 22, 2012, at 7:55 AM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I was thinking of using versions with custom timestamps to store the
> evolution of a column value - as opposed to creating several (time_t,
> value_at_time_t) qualifier-value pairs. The value to be stored is a single
> integer. Fast ad-hoc retrieval of multiple versions based on a row key +
> filter [1] (i.e through a web service) is important, the number of row keys
> will be between 10^6 and 10^9.
>
> a) If the number of versions (timestamps) is moderate, can I expect
> read/filtering performance to be better than when using multiple
> qualifier/value pairs?
> b) For a larger number of versions, say 365, what if any precautions should
> I take with respect to the HBase/table setup.
>
> I looked around a bit and found the following:
>
> The documentation [2] mentions that the maximum number of versions should
> not be too high ("in the hundreds"). The HBase o'Reilly book [3] on the
> other hand mentions that Facebook use(d) versions to store inbox messages
> in order. Clearly, the number of messages may grow quite large (>> 100). Is
> [1] still valid with more recent versions of HBase?
>
> Thank you,
>
> /David
>
> [1]
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/TimestampsFilter.html
> [2] http://hbase.apache.org/book/schema.versions.html
> [3] 1st edition, page 384
+
David Koch 2012-11-22, 20:47
+
anil gupta 2012-11-22, 21:12