-Re: setTimeRange for HBase Increment
Gary Helmling 2011-10-04, 18:52
If you just need the increments to not be visible when > 30 days old, then
put the increment columns in their own column family and set TTL=2592000 (30
days in seconds).
Note that the timestamp is updated on each increment, so a column that
always receives increments before the TTL window runs out will never expire.
Is this the problem? Are you looking to do rolling expiration of the
increment values? If so you could do some combination of increments with
limited time ranges (always set minStamp to 12:00am of the current day to
roll over to a new version per day) or represent the truncated date in
either the column qualifier or row key. This way you're incrementing
(aggregating) over limited periods to allow for data expiration, and can
easily do summing for the period you're concerned with. Again, openTSDB
does some smart things with efficiently constructing keys for these types of
scenarios, so it's definitely worth looking at.
If neither of these really addresses what you're looking for, maybe you can
explain your requirements in a bit more detail? HBase schema design is a
fine art, but it helps to be able to see the big picture.
On Tue, Oct 4, 2011 at 11:14 AM, Jameson Lopp <[EMAIL PROTECTED]> wrote:
> Thanks, that makes sense. Unfortunately, it sounds like this feature is
> unable to solve my particular problem...
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
> On 10/04/2011 01:36 PM, Gary Helmling wrote:
>> The TimeRange you set on the Increment is used in looking up the previous
>> value that you'll be incrementing. It's not stored with the incremented
>> value as a data "lifetime" or anything. If a previously stored value is
>> found within the given time range, it will be incremented. If no value is
>> found within that range, a new value is stored with using the value from
>> your Increment.
>> As other have already covered, if you're looking for auto-cleanup of data
>> you would set a TTL on the column family.
>> So let me tweak your scenario a bit to explain how it might work:
>> 0) Say you have a previous value on column "c1" of 2, last incremented 31
>> days ago
>> 1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
>> days, maxStamp = now
>> 2) There is now a new version of "c1", with value=1, timestamp=now. The
>> previous version, with value=2, timestamp=now - 31 days, still exists and
>> may be automatically cleaned up, subject to your settings for max versions
>> and TTL. So you would have:
>> - v2: ts=now, value=1
>> - v1: ts=now-31days, value=2
>> 3) Reading the current value of "c1" will return 1
>> 4a) If you repeat step #1 in 31 days from now, you would wind up with a
>> third version of "c1", again with value=1:
>> - v3: ts=now, value=1
>> - v2: ts=now-31days, value=1
>> - v1: ts=now-62days, value=2
>> 4b) If you instead repeat step #1 31 days from now, but using minStamp=now
>> 60 days, maxStamp=now, then you would be incrementing the existing "v2" of
>> "c1", since it falls within the time range:
>> - v2: ts=now, value=2
>> - v1: ts=now-62days, value=2
>> I hope this clarifies things.
>> On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp<[EMAIL PROTECTED]>
>> Thanks! Nevertheless, can anyone confirm / deny if the scenario I
>>> would play out in that manner? Just want to make sure I understand the
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc
>>> On 09/29/2011 03:32 PM, Doug Meil wrote:
>>>> Here are a few links on table cleanup and major compactions...
>>>> (ttl related)