Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Practical Upper Limit on Number of Version Stored?


Copy link to this message
-
Re: Practical Upper Limit on Number of Version Stored?
Look,

Just because you can do something, doesn't mean its a good idea.

From a design perspective its not a good idea.

Ask yourself why does versioning exist?  What purpose does versioning serve in HBase?

From a design perspective you have to ask yourself what are you attempting to do.

Here the OP says ..
"I guess I don't really understand why I wouldn't want to do this.  For our use case we only really care about the user's last 50 to 200 events.  We don't really care about deleting events explicitly.  More than likely we would enable a TTL to get rid of events older than a certain time. "

So his goal is to get the last N events first.

Remember columns are in sort order.
So if you have Event-XXXX or XXXX-Event as your column identifier (name), where XXXX is (Epoc - timestamp) ...
You will have your events in last event first.

This not only achieves what the OP wants, but ... I seem to recall some people posting here about methods to only return N results from a row at a time?
And here's the kicker...

From a design perspective...

Suppose you have event A occurring at time X.
Then you have event B occurring at time X2.

Are they the same?

Based on the OPs limited description A and B are not.
So why store them as versions as if they were the same?

Versioning may make sense if we were talking about an RSVP to a function.
At time T, Bob, may RSVP 'yes'.
At time T1, Bob may RSVP 'tentative'.
At time T2, Bob may RSVP, 'no'.

Each version is describing the same object.
Does that make sense?

Good design is critical...

Just putting it out there.  ;-)
On Dec 5, 2013, at 9:50 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:

> Version is just a timestamp (event time) => naturally fits time-series (event) types of data.
> Besides this, events are immutable objects, if they are not, not than they are not events.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [EMAIL PROTECTED]
>
> ________________________________________
> From: Michael Segel [[EMAIL PROTECTED]]
> Sent: Thursday, December 05, 2013 5:10 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Practical Upper Limit on Number of Version Stored?
>
> You really don't want to do this.
> Its not what the versioning was meant for and it has a couple of serious flaws.
>
> The biggest flaw... what happens when you want to delete a version? ...
>
> There are other options... depending on your use case and how you use the events.
>
> Truly using versioning beyond versions of the same data.. not a good idea.
>
> On Dec 5, 2013, at 4:47 PM, Shawn Hermans <[EMAIL PROTECTED]> wrote:
>
>> All,
>> I am working on an HBase application where we store user events in an HBase
>> table.  The row key is the a user identifier and each column is an event
>> identifier.  Most users only have a handful of events (10 or less), but
>> some users have a few hundred thousand events or more and this causes
>> issues when an HBase client tries to retrieve all those events.
>>
>> We are looking at different ways of limiting then number events returned.
>> One idea is to store each event using its own column qualifier, but
>> instead use HBase's versioning capability to store the last 100 to 200
>> events. It doesn't seem like we would run into issues with this approach,
>> but I want to see if anyone has had any practical experience in this area.
>> The advice given in http://hbase.apache.org/book/schema.versions.html is a
>> little ambiguous.
>>
>> Thanks,
>> Shawn
>
> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB