Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - How would you model this in Hbase?

Copy link to this message
Re: How would you model this in Hbase?
James Taylor 2013-02-06, 22:01
Another approach would be to use Phoenix
(http://github.com/forcedotcom/phoenix). You can model your schema as
you would in the relational world, but you get the horizontal
scalability of HBase.


On 02/06/2013 01:49 PM, Michael Segel wrote:
> Overloading the time stamp aka the versions of the cell is really not a good idea.
> Yeah, I know opinions are like A.... everyone has one. ;-)
> But you have to be aware that if someone decides to delete some data... well one tombstone marker for the column, goodbye all of the versions of the cell.
> (Any ideas on a clean easy way to remove that tombstone?  ;-)
> You're better off using other methods of adding dimension to your cell. One that works well is using Avro.
> When I teach a course on HBase, I do mention about cells in my schema design section of the course. I talk about the ability to use the versioning as a way to add dimension and then tell the students that this really isn't a good idea ...
> -Just saying...
> On Feb 6, 2013, at 3:05 PM, Ian Varley <[EMAIL PROTECTED]> wrote:
>> Alex,
>> This might be an interesting use of the time dimension in HBase. Every value in HBase is uniquely represented by a set of coordinates:
>> - table
>> - row key
>> - column family
>> - column qualifier
>> - timestamp
>> So, you can have two different values that have all the same coordinates, except their timestamp. So for your example, that could be:
>> - table: econ
>> - row key: "indicatorABC"
>> - column family: cf1
>> - column qualifier: "reporting_2011-10-01"
>> first value:
>> - timestamp: "2011-11-01 00:00:00.000"
>> - value: 2
>> second value:
>> - timestamp: "2011-12-01 00:00:00.000"
>> - value: 2.5
>> I.e., if you load the data such that the timestamps on the values represent the release date, then you can model this in a natural way. By default, reads in HBase will only give you the latest value, but you can manually tell a scanner to give you "time travel" by only reporting values as of an older date; so you could say "tell me what the data would have said on 11/01".
>> (Also, by default, HBase only keeps a limited number of historical versions (3), but you can tell it to keep all versions.)
>> There are some downsides to using the time dimension explicitly like this:
>> - If you back date things and also work with deletes, you could get some weird behavior depending on when compaction runs.
>> - If you have lots of versions of things, the server still has to read over these when you scan, which makes things slower. (Probably doesn't apply if you only have a couple historical versions of any given value.)
>> All the usual caveats apply: don't bother with HBase unless you've got some serious size in your data (e.g. TB) and need to support a heavy load of real-time updates and queries. Otherwise, go with something simpler to operate like a relational database, couchdb, etc.
>> Ian
>> On Feb 6, 2013, at 2:24 PM, Alex Grund wrote:
>> Hi,
>> I am a newbie in nosql-databases and I am wondering how to model a
>> specific case with Hbase.
>> The thing I want to model are economic time series, such as
>> unemployment rate in a given country.
>> The complicated thing is this: Values of an economic time series can,
>> but do not have to be revised.
>> An example:
>> Imagine, the time series is published monthly, at the first day of a
>> month with the value for the previous month, such like:
>> Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1
>> Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2
>> Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3
>> Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4
>> (where "release" is the date of release and "reporting" is the date of
>> the month the "value" refers to. Read: "On Dec 1, 2011 the
>> unemployement rate for Nov 2011 was reported to be "1").
>> Now, imagine, that on every release, the value for the previous month