Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Custom versioning best practices


+
David Koch 2012-11-22, 13:55
+
Michael Segel 2012-11-22, 16:11
+
David Koch 2012-11-22, 20:47
Copy link to this message
-
Re: Custom versioning best practices
anil gupta 2012-11-22, 21:12
Hi David,

 As per my knowledge,  HBase currently doesn't supports specifying separate
setMaxVersion for different column family in a single Scan object.

HTH,
Anil

On Thu, Nov 22, 2012 at 12:47 PM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello Michael,
>
> Thank you for your response.
>
> By the way, is it possible to set setMaxVersions per column family on a
> scan?
>
> /David
>
>
> On Thu, Nov 22, 2012 at 5:11 PM, Michael Segel <[EMAIL PROTECTED]
> >wrote:
>
> > IMHO, the best practice is not to do this.
> >
> > Its an abuse of versioning and if you really want to store temporal data,
> > make it part of the column name.
> >
> >
> > On Nov 22, 2012, at 7:55 AM, David Koch <[EMAIL PROTECTED]> wrote:
> >
> > > Hello,
> > >
> > > I was thinking of using versions with custom timestamps to store the
> > > evolution of a column value - as opposed to creating several (time_t,
> > > value_at_time_t) qualifier-value pairs. The value to be stored is a
> > single
> > > integer. Fast ad-hoc retrieval of multiple versions based on a row key
> +
> > > filter [1] (i.e through a web service) is important, the number of row
> > keys
> > > will be between 10^6 and 10^9.
> > >
> > > a) If the number of versions (timestamps) is moderate, can I expect
> > > read/filtering performance to be better than when using multiple
> > > qualifier/value pairs?
> > > b) For a larger number of versions, say 365, what if any precautions
> > should
> > > I take with respect to the HBase/table setup.
> > >
> > > I looked around a bit and found the following:
> > >
> > > The documentation [2] mentions that the maximum number of versions
> should
> > > not be too high ("in the hundreds"). The HBase o'Reilly book [3] on the
> > > other hand mentions that Facebook use(d) versions to store inbox
> messages
> > > in order. Clearly, the number of messages may grow quite large (>>
> 100).
> > Is
> > > [1] still valid with more recent versions of HBase?
> > >
> > > Thank you,
> > >
> > > /David
> > >
> > > [1]
> > >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/TimestampsFilter.html
> > > [2] http://hbase.apache.org/book/schema.versions.html
> > > [3] 1st edition, page 384
> >
> >
>

--
Thanks & Regards,
Anil Gupta