Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: How would you model this in Hbase?


+
Ulrich Staudinger 2013-02-07, 13:51
+
Ian Varley 2013-02-07, 14:00
+
Alex Grund 2013-02-06, 20:24
+
Ian Varley 2013-02-06, 21:05
Copy link to this message
-
Re: How would you model this in Hbase?
Michael Segel 2013-02-06, 21:49
Overloading the time stamp aka the versions of the cell is really not a good idea.

Yeah, I know opinions are like A.... everyone has one. ;-)

But you have to be aware that if someone decides to delete some data... well one tombstone marker for the column, goodbye all of the versions of the cell.
(Any ideas on a clean easy way to remove that tombstone?  ;-)

You're better off using other methods of adding dimension to your cell. One that works well is using Avro.

When I teach a course on HBase, I do mention about cells in my schema design section of the course. I talk about the ability to use the versioning as a way to add dimension and then tell the students that this really isn't a good idea ...

-Just saying...

On Feb 6, 2013, at 3:05 PM, Ian Varley <[EMAIL PROTECTED]> wrote:

> Alex,
>
> This might be an interesting use of the time dimension in HBase. Every value in HBase is uniquely represented by a set of coordinates:
>
> - table
> - row key
> - column family
> - column qualifier
> - timestamp
>
> So, you can have two different values that have all the same coordinates, except their timestamp. So for your example, that could be:
>
> - table: econ
> - row key: "indicatorABC"
> - column family: cf1
> - column qualifier: "reporting_2011-10-01"
>
> first value:
> - timestamp: "2011-11-01 00:00:00.000"
> - value: 2
>
> second value:
> - timestamp: "2011-12-01 00:00:00.000"
> - value: 2.5
>
> I.e., if you load the data such that the timestamps on the values represent the release date, then you can model this in a natural way. By default, reads in HBase will only give you the latest value, but you can manually tell a scanner to give you "time travel" by only reporting values as of an older date; so you could say "tell me what the data would have said on 11/01".
>
> (Also, by default, HBase only keeps a limited number of historical versions (3), but you can tell it to keep all versions.)
>
> There are some downsides to using the time dimension explicitly like this:
> - If you back date things and also work with deletes, you could get some weird behavior depending on when compaction runs.
> - If you have lots of versions of things, the server still has to read over these when you scan, which makes things slower. (Probably doesn't apply if you only have a couple historical versions of any given value.)
>
> All the usual caveats apply: don't bother with HBase unless you've got some serious size in your data (e.g. TB) and need to support a heavy load of real-time updates and queries. Otherwise, go with something simpler to operate like a relational database, couchdb, etc.
>
> Ian
>
> On Feb 6, 2013, at 2:24 PM, Alex Grund wrote:
>
> Hi,
>
> I am a newbie in nosql-databases and I am wondering how to model a
> specific case with Hbase.
>
> The thing I want to model are economic time series, such as
> unemployment rate in a given country.
>
> The complicated thing is this: Values of an economic time series can,
> but do not have to be revised.
>
> An example:
>
> Imagine, the time series is published monthly, at the first day of a
> month with the value for the previous month, such like:
>
> Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1
> Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2
> Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3
> Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4
>
> (where "release" is the date of release and "reporting" is the date of
> the month the "value" refers to. Read: "On Dec 1, 2011 the
> unemployement rate for Nov 2011 was reported to be "1").
>
> Now, imagine, that on every release, the value for the previous month
> is revised, such like:
>
> Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1
> Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5
>
> Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2
> Unemployment; release: 2011/11/01; reporting: 2011/09/01; value: 3.5

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com
+
James Taylor 2013-02-06, 22:01
+
Ulrich Staudinger 2013-02-07, 07:26
+
Ian Varley 2013-02-07, 13:35
+
Ulrich Staudinger 2013-02-07, 07:14