Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to design a data warehouse in HBase?


Copy link to this message
-
RE: How to design a data warehouse in HBase?
Thanks. I think a real example is better for me to understand your suggestions.
Now I have a relational table:ID   LoginTime                    DeviceID1     2012-12-12 12:12:12   abcdef2     2012-12-12  19:12:12   abcdef3      2012-12-13   10:10:10  defdaf
There are several requirements about this table:1. How many device login in each day?1. For one day, how many new device login? (never login before)1. For one day, how many accumulated device login?
How can I design HBase tables to calculate these data?Now my solution is:table A:    
rowkey:  date-deviceidcolumn family: logincolumn qualifier:  2012-12-12 12:12:12/2012-12-12 19:12:12....
table B:rowkey: deviceidcolumn family:null or anyvalue

For req#1, I can scan table A and use prefixfilter(rowkey) to check one special date, and get records countFor req#2, I get table b with each deviceid, and count result
For req#3, count table A with prefixfilter like 1.
Does it OK?  Or other better solutions?
Thanks!!

> CC: [EMAIL PROTECTED]
> From: [EMAIL PROTECTED]
> Subject: Re: How to design a data warehouse in HBase?
> Date: Thu, 13 Dec 2012 08:43:31 +0000
> To: [EMAIL PROTECTED]
>
> You need to spend a bit of time on Schema design.
> You need to flatten your Schema...
> Implement some secondary indexing to improve join performance...
>
> Depends on what you want to do... There are other options too...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Dec 13, 2012, at 7:09 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > For OLAP type queries you will generally be better off with a truly column oriented database.
> > You can probably shoehorn HBase into this, but it wasn't really designed with raw scan performance along single columns in mind.
> >
> >
> >
> > ________________________________
> > From: bigdata <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Sent: Wednesday, December 12, 2012 9:57 PM
> > Subject: How to design a data warehouse in HBase?
> >
> > Dear all,
> > We have a traditional star-model data warehouse in RDBMS, now we want to transfer it to HBase. After study HBase, I learn that HBase is normally can be query by rowkey.
> > 1.full rowkey (fastest)2.rowkey filter (fast)3.column family/qualifier filter (slow)
> > How can I design the HBase tables to implement the warehouse functions, like:1.Query by DimensionA2.Query by DimensionA and DimensionB3.Sum, count, distinct ...
> > From my opinion, I should create several HBase tables with all combinations of different dimensions as the rowkey. This solution will lead to huge data duplication. Is there any good suggestions to solve it?
> > Thanks a lot!
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB