Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> setTimeRange for HBase Increment


+
Jameson Lopp 2011-09-29, 18:04
+
Jean-Daniel Cryans 2011-09-29, 18:12
+
Jameson Lopp 2011-09-29, 18:22
+
Ted Yu 2011-09-29, 18:29
+
Doug Meil 2011-09-29, 19:32
+
Jameson Lopp 2011-09-29, 19:40
Copy link to this message
-
Re: setTimeRange for HBase Increment
Jameson,

The TimeRange you set on the Increment is used in looking up the previous
value that you'll be incrementing.  It's not stored with the incremented
value as a data "lifetime" or anything.  If a previously stored value is
found within the given time range, it will be incremented.  If no value is
found within that range, a new value is stored with using the value from
your Increment.

As other have already covered, if you're looking for auto-cleanup of data
you would set a TTL on the column family.

So let me tweak your scenario a bit to explain how it might work:

0) Say you have a previous value on column "c1" of 2, last incremented 31
days ago

1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
days, maxStamp = now

2) There is now a new version of "c1", with value=1, timestamp=now.  The
previous version, with value=2, timestamp=now - 31 days, still exists and
may be automatically cleaned up, subject to your settings for max versions
and TTL.  So you would have:

c1:
  - v2: ts=now, value=1
  - v1: ts=now-31days, value=2

3) Reading the current value of "c1" will return 1

4a) If you repeat step #1 in 31 days from now, you would wind up with a
third version of "c1", again with value=1:

c1:
  - v3: ts=now, value=1
  - v2: ts=now-31days, value=1
  - v1: ts=now-62days, value=2

4b) If you instead repeat step #1 31 days from now, but using minStamp=now -
60 days, maxStamp=now, then you would be incrementing the existing "v2" of
"c1", since it falls within the time range:

c1:
  - v2: ts=now, value=2
  - v1: ts=now-62days, value=2
I hope this clarifies things.

--gh
On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp <[EMAIL PROTECTED]> wrote:

> Thanks! Nevertheless, can anyone confirm / deny if the scenario I described
> would play out in that manner? Just want to make sure I understand the
> functionality.
>
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
> On 09/29/2011 03:32 PM, Doug Meil wrote:
>
>>
>> Here are a few links on table cleanup and major compactions...
>>
>> http://hbase.apache.org/book.**html#schema.minversions<http://hbase.apache.org/book.html#schema.minversions>  (ttl related)
>>
>> http://hbase.apache.org/book.**html#perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue>
>>
>> http://hbase.apache.org/book.**html#compaction<http://hbase.apache.org/book.html#compaction>
>>
>>
>>
>>
>>
>> On 9/29/11 2:29 PM, "Ted Yu"<[EMAIL PROTECTED]>  wrote:
>>
>>  Doug Meil may point you to related doc.
>>>
>>> Take a look at this as well:
>>> https://issues.apache.org/**jira/browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241>
>>>
>>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<[EMAIL PROTECTED]>
>>>  wrote:
>>>
>>>  Hm, well I didn't mention a number of other requirements for the feature
>>>> I'm building, but long story short, I need to keep track of millions to
>>>> billions of these counters and need the lookup time to be as close to
>>>> constant time as possible, thus I was really hoping to avoid doing table
>>>> scans.
>>>>
>>>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>>>> article / documentation I could read about it? Google wasn't very
>>>> helpful.
>>>>
>>>>
>>>> --
>>>> Jameson Lopp
>>>> Software Engineer
>>>> Bronto Software, Inc
>>>>
>>>>
>>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>>
>>>>  My advice usually regarding timestamps is if it's part of your data
>>>>> model, it should appear somewhere in an HBase key. 99% of the time
>>>>> overloading the HBase timestamps is a bad idea, especially with
>>>>> counters since there's auto-pruning done in the Memstore!
>>>>>
>>>>> I would suggest you make time part of your row key, maybe one counter
>>>>> per day, and then set the TTL on your table to 30 days. Then all you
>>>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>>>> that refers to some event id.
>>>>>
>>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
+
Jameson Lopp 2011-10-04, 18:14
+
Gary Helmling 2011-10-04, 18:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB