Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Custom versioning best practices


+
David Koch 2012-11-22, 13:55
+
Michael Segel 2012-11-22, 16:11
+
David Koch 2012-11-22, 20:47
Copy link to this message
-
Re: Custom versioning best practices
Hi David,

 As per my knowledge,  HBase currently doesn't supports specifying separate
setMaxVersion for different column family in a single Scan object.

HTH,
Anil

On Thu, Nov 22, 2012 at 12:47 PM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello Michael,
>
> Thank you for your response.
>
> By the way, is it possible to set setMaxVersions per column family on a
> scan?
>
> /David
>
>
> On Thu, Nov 22, 2012 at 5:11 PM, Michael Segel <[EMAIL PROTECTED]
> >wrote:
>
> > IMHO, the best practice is not to do this.
> >
> > Its an abuse of versioning and if you really want to store temporal data,
> > make it part of the column name.
> >
> >
> > On Nov 22, 2012, at 7:55 AM, David Koch <[EMAIL PROTECTED]> wrote:
> >
> > > Hello,
> > >
> > > I was thinking of using versions with custom timestamps to store the
> > > evolution of a column value - as opposed to creating several (time_t,
> > > value_at_time_t) qualifier-value pairs. The value to be stored is a
> > single
> > > integer. Fast ad-hoc retrieval of multiple versions based on a row key
> +
> > > filter [1] (i.e through a web service) is important, the number of row
> > keys
> > > will be between 10^6 and 10^9.
> > >
> > > a) If the number of versions (timestamps) is moderate, can I expect
> > > read/filtering performance to be better than when using multiple
> > > qualifier/value pairs?
> > > b) For a larger number of versions, say 365, what if any precautions
> > should
> > > I take with respect to the HBase/table setup.
> > >
> > > I looked around a bit and found the following:
> > >
> > > The documentation [2] mentions that the maximum number of versions
> should
> > > not be too high ("in the hundreds"). The HBase o'Reilly book [3] on the
> > > other hand mentions that Facebook use(d) versions to store inbox
> messages
> > > in order. Clearly, the number of messages may grow quite large (>>
> 100).
> > Is
> > > [1] still valid with more recent versions of HBase?
> > >
> > > Thank you,
> > >
> > > /David
> > >
> > > [1]
> > >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/TimestampsFilter.html
> > > [2] http://hbase.apache.org/book/schema.versions.html
> > > [3] 1st edition, page 384
> >
> >
>

--
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB