Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - setTimeRange for HBase Increment


+
Jameson Lopp 2011-09-29, 18:04
+
Jean-Daniel Cryans 2011-09-29, 18:12
+
Jameson Lopp 2011-09-29, 18:22
+
Ted Yu 2011-09-29, 18:29
+
Doug Meil 2011-09-29, 19:32
+
Jameson Lopp 2011-09-29, 19:40
Copy link to this message
-
Re: setTimeRange for HBase Increment
Gary Helmling 2011-10-04, 17:36
Jameson,

The TimeRange you set on the Increment is used in looking up the previous
value that you'll be incrementing.  It's not stored with the incremented
value as a data "lifetime" or anything.  If a previously stored value is
found within the given time range, it will be incremented.  If no value is
found within that range, a new value is stored with using the value from
your Increment.

As other have already covered, if you're looking for auto-cleanup of data
you would set a TTL on the column family.

So let me tweak your scenario a bit to explain how it might work:

0) Say you have a previous value on column "c1" of 2, last incremented 31
days ago

1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
days, maxStamp = now

2) There is now a new version of "c1", with value=1, timestamp=now.  The
previous version, with value=2, timestamp=now - 31 days, still exists and
may be automatically cleaned up, subject to your settings for max versions
and TTL.  So you would have:

c1:
  - v2: ts=now, value=1
  - v1: ts=now-31days, value=2

3) Reading the current value of "c1" will return 1

4a) If you repeat step #1 in 31 days from now, you would wind up with a
third version of "c1", again with value=1:

c1:
  - v3: ts=now, value=1
  - v2: ts=now-31days, value=1
  - v1: ts=now-62days, value=2

4b) If you instead repeat step #1 31 days from now, but using minStamp=now -
60 days, maxStamp=now, then you would be incrementing the existing "v2" of
"c1", since it falls within the time range:

c1:
  - v2: ts=now, value=2
  - v1: ts=now-62days, value=2
I hope this clarifies things.

--gh
On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp <[EMAIL PROTECTED]> wrote:

> Thanks! Nevertheless, can anyone confirm / deny if the scenario I described
> would play out in that manner? Just want to make sure I understand the
> functionality.
>
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
> On 09/29/2011 03:32 PM, Doug Meil wrote:
>
>>
>> Here are a few links on table cleanup and major compactions...
>>
>> http://hbase.apache.org/book.**html#schema.minversions<http://hbase.apache.org/book.html#schema.minversions>  (ttl related)
>>
>> http://hbase.apache.org/book.**html#perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue>
>>
>> http://hbase.apache.org/book.**html#compaction<http://hbase.apache.org/book.html#compaction>
>>
>>
>>
>>
>>
>> On 9/29/11 2:29 PM, "Ted Yu"<[EMAIL PROTECTED]>  wrote:
>>
>>  Doug Meil may point you to related doc.
>>>
>>> Take a look at this as well:
>>> https://issues.apache.org/**jira/browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241>
>>>
>>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<[EMAIL PROTECTED]>
>>>  wrote:
>>>
>>>  Hm, well I didn't mention a number of other requirements for the feature
>>>> I'm building, but long story short, I need to keep track of millions to
>>>> billions of these counters and need the lookup time to be as close to
>>>> constant time as possible, thus I was really hoping to avoid doing table
>>>> scans.
>>>>
>>>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>>>> article / documentation I could read about it? Google wasn't very
>>>> helpful.
>>>>
>>>>
>>>> --
>>>> Jameson Lopp
>>>> Software Engineer
>>>> Bronto Software, Inc
>>>>
>>>>
>>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>>
>>>>  My advice usually regarding timestamps is if it's part of your data
>>>>> model, it should appear somewhere in an HBase key. 99% of the time
>>>>> overloading the HBase timestamps is a bad idea, especially with
>>>>> counters since there's auto-pruning done in the Memstore!
>>>>>
>>>>> I would suggest you make time part of your row key, maybe one counter
>>>>> per day, and then set the TTL on your table to 30 days. Then all you
>>>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>>>> that refers to some event id.
>>>>>
>>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
+
Jameson Lopp 2011-10-04, 18:14
+
Gary Helmling 2011-10-04, 18:52