Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How to design a data warehouse in HBase?


+
bigdata 2012-12-13, 05:57
+
lars hofhansl 2012-12-13, 07:09
+
Michel Segel 2012-12-13, 08:43
Copy link to this message
-
RE: How to design a data warehouse in HBase?
Thanks. I think a real example is better for me to understand your suggestions.
Now I have a relational table:ID   LoginTime                    DeviceID1     2012-12-12 12:12:12   abcdef2     2012-12-12  19:12:12   abcdef3      2012-12-13   10:10:10  defdaf
There are several requirements about this table:1. How many device login in each day?1. For one day, how many new device login? (never login before)1. For one day, how many accumulated device login?
How can I design HBase tables to calculate these data?Now my solution is:table A:    
rowkey:  date-deviceidcolumn family: logincolumn qualifier:  2012-12-12 12:12:12/2012-12-12 19:12:12....
table B:rowkey: deviceidcolumn family:null or anyvalue

For req#1, I can scan table A and use prefixfilter(rowkey) to check one special date, and get records countFor req#2, I get table b with each deviceid, and count result
For req#3, count table A with prefixfilter like 1.
Does it OK?  Or other better solutions?
Thanks!!

> CC: [EMAIL PROTECTED]
> From: [EMAIL PROTECTED]
> Subject: Re: How to design a data warehouse in HBase?
> Date: Thu, 13 Dec 2012 08:43:31 +0000
> To: [EMAIL PROTECTED]
>
> You need to spend a bit of time on Schema design.
> You need to flatten your Schema...
> Implement some secondary indexing to improve join performance...
>
> Depends on what you want to do... There are other options too...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Dec 13, 2012, at 7:09 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > For OLAP type queries you will generally be better off with a truly column oriented database.
> > You can probably shoehorn HBase into this, but it wasn't really designed with raw scan performance along single columns in mind.
> >
> >
> >
> > ________________________________
> > From: bigdata <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Sent: Wednesday, December 12, 2012 9:57 PM
> > Subject: How to design a data warehouse in HBase?
> >
> > Dear all,
> > We have a traditional star-model data warehouse in RDBMS, now we want to transfer it to HBase. After study HBase, I learn that HBase is normally can be query by rowkey.
> > 1.full rowkey (fastest)2.rowkey filter (fast)3.column family/qualifier filter (slow)
> > How can I design the HBase tables to implement the warehouse functions, like:1.Query by DimensionA2.Query by DimensionA and DimensionB3.Sum, count, distinct ...
> > From my opinion, I should create several HBase tables with all combinations of different dimensions as the rowkey. This solution will lead to huge data duplication. Is there any good suggestions to solve it?
> > Thanks a lot!
     
+
Mohammad Tariq 2012-12-13, 09:42
+
bigdata 2012-12-13, 09:47
+
Mohammad Tariq 2012-12-13, 10:13
+
bigdata 2012-12-13, 14:28
+
Mohammad Tariq 2012-12-13, 14:44
+
Kevin Odell 2012-12-13, 14:47
+
Mohammad Tariq 2012-12-13, 15:06
+
Kevin Odell 2012-12-13, 15:30
+
Mohammad Tariq 2012-12-13, 15:33
+
Manoj Babu 2012-12-13, 16:38
+
Kevin Odell 2012-12-13, 16:42
+
Michel Segel 2012-12-14, 00:49
+
Michael Segel 2012-12-13, 20:20
+
Asaf Mesika 2012-12-15, 02:14