Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to design a data warehouse in HBase?


Copy link to this message
-
RE: How to design a data warehouse in HBase?
Hi, Tariq
Thanks for your feedback. Actually, now we have two ways to reach the target, by Hive and  by HBase.Could you tell me why HBase is not good for my requirements?Or what's the problem in my solution?
Thanks.

> From: [EMAIL PROTECTED]
> Date: Thu, 13 Dec 2012 15:43:25 +0530
> Subject: Re: How to design a data warehouse in HBase?
> To: [EMAIL PROTECTED]
>
> Both have got different purposes. Normally people say that Hive is slow,
> that's just because it uses MapReduce under the hood. And i'm sure that if
> the data stored in HBase is very huge, nobody would write sequential
> programs for Get or Scan. Instead they will write MP jobs or do something
> similar.
>
> My point is that nothing can be 100% real time. Is that what you want?If
> that is the case I would never suggest Hadoop on the first place as it's a
> batch processing system and cannot be used like an OLTP system, unless you
> have thought of some additional stuff. Since you are talking about
> warehouse, I am assuming you are going to store and process gigantic
> amounts of data. That's the only reason I had suggested Hive.
>
> The whole point is that everything is not a solution for everything. One
> size doesn't fit all. First, we need to analyze our particular use case.
> The person, who says Hive is slow, might be correct. But only for his
> scenario.
>
> HTH
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 3:17 PM, bigdata <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> > I've got the information that HIVE 's performance is too low. It access
> > HDFS files and scan all data to search one record. IS it TRUE? And HBase is
> > much faster than it.
> >
> >
> > > From: [EMAIL PROTECTED]
> > > Date: Thu, 13 Dec 2012 15:12:25 +0530
> > > Subject: Re: How to design a data warehouse in HBase?
> > > To: [EMAIL PROTECTED]
> > >
> > > Hi there,
> > >
> > >    If you are really planning for a warehousing solution then I would
> > > suggest you to have a look over Apache Hive. It provides you warehousing
> > > capabilities on top of a Hadoop cluster. Along with that it also provides
> > > an SQLish interface to the data stored in your warehouse, which would be
> > > very helpful to you, in case you are coming from an SQL background.
> > >
> > > HTH
> > >
> > >
> > >
> > > Regards,
> > >     Mohammad Tariq
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 2:43 PM, bigdata <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Thanks. I think a real example is better for me to understand your
> > > > suggestions.
> > > > Now I have a relational table:ID   LoginTime
> >  DeviceID1
> > > >     2012-12-12 12:12:12   abcdef2     2012-12-12  19:12:12   abcdef3
> > > >  2012-12-13   10:10:10  defdaf
> > > > There are several requirements about this table:1. How many device
> > login
> > > > in each day?1. For one day, how many new device login? (never login
> > > > before)1. For one day, how many accumulated device login?
> > > > How can I design HBase tables to calculate these data?Now my solution
> > > > is:table A:
> > > > rowkey:  date-deviceidcolumn family: logincolumn qualifier:  2012-12-12
> > > > 12:12:12/2012-12-12 19:12:12....
> > > > table B:rowkey: deviceidcolumn family:null or anyvalue
> > > >
> > > > For req#1, I can scan table A and use prefixfilter(rowkey) to check one
> > > > special date, and get records countFor req#2, I get table b with each
> > > > deviceid, and count result
> > > > For req#3, count table A with prefixfilter like 1.
> > > > Does it OK?  Or other better solutions?
> > > > Thanks!!
> > > >
> > > > > CC: [EMAIL PROTECTED]
> > > > > From: [EMAIL PROTECTED]
> > > > > Subject: Re: How to design a data warehouse in HBase?
> > > > > Date: Thu, 13 Dec 2012 08:43:31 +0000
> > > > > To: [EMAIL PROTECTED]
> > > > >
> > > > > You need to spend a bit of time on Schema design.
> > > > > You need to flatten your Schema...
> > > > > Implement some secondary indexing to improve join performance...
> > > > >
> >      
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB