Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How to design a data warehouse in HBase?


+
bigdata 2012-12-13, 05:57
+
lars hofhansl 2012-12-13, 07:09
+
Michel Segel 2012-12-13, 08:43
+
bigdata 2012-12-13, 09:13
+
Mohammad Tariq 2012-12-13, 09:42
+
bigdata 2012-12-13, 09:47
+
Mohammad Tariq 2012-12-13, 10:13
+
bigdata 2012-12-13, 14:28
Copy link to this message
-
Re: How to design a data warehouse in HBase?
I am not saying Hbase is not good. My point was to consider Hive as well.
Think about the approach keeping both the tools in mind and decide. I just
provided an option keeping in mind the available built-in Hive features. I
would like to add one more point here, you can map your Hbase tables to
Hive.

Regards,
    Mohammad Tariq

On Thu, Dec 13, 2012 at 7:58 PM, bigdata <[EMAIL PROTECTED]> wrote:

> Hi, Tariq
> Thanks for your feedback. Actually, now we have two ways to reach the
> target, by Hive and  by HBase.Could you tell me why HBase is not good for
> my requirements?Or what's the problem in my solution?
> Thanks.
>
> > From: [EMAIL PROTECTED]
> > Date: Thu, 13 Dec 2012 15:43:25 +0530
> > Subject: Re: How to design a data warehouse in HBase?
> > To: [EMAIL PROTECTED]
> >
> > Both have got different purposes. Normally people say that Hive is slow,
> > that's just because it uses MapReduce under the hood. And i'm sure that
> if
> > the data stored in HBase is very huge, nobody would write sequential
> > programs for Get or Scan. Instead they will write MP jobs or do something
> > similar.
> >
> > My point is that nothing can be 100% real time. Is that what you want?If
> > that is the case I would never suggest Hadoop on the first place as it's
> a
> > batch processing system and cannot be used like an OLTP system, unless
> you
> > have thought of some additional stuff. Since you are talking about
> > warehouse, I am assuming you are going to store and process gigantic
> > amounts of data. That's the only reason I had suggested Hive.
> >
> > The whole point is that everything is not a solution for everything. One
> > size doesn't fit all. First, we need to analyze our particular use case.
> > The person, who says Hive is slow, might be correct. But only for his
> > scenario.
> >
> > HTH
> >
> > Regards,
> >     Mohammad Tariq
> >
> >
> >
> > On Thu, Dec 13, 2012 at 3:17 PM, bigdata <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi,
> > > I've got the information that HIVE 's performance is too low. It access
> > > HDFS files and scan all data to search one record. IS it TRUE? And
> HBase is
> > > much faster than it.
> > >
> > >
> > > > From: [EMAIL PROTECTED]
> > > > Date: Thu, 13 Dec 2012 15:12:25 +0530
> > > > Subject: Re: How to design a data warehouse in HBase?
> > > > To: [EMAIL PROTECTED]
> > > >
> > > > Hi there,
> > > >
> > > >    If you are really planning for a warehousing solution then I would
> > > > suggest you to have a look over Apache Hive. It provides you
> warehousing
> > > > capabilities on top of a Hadoop cluster. Along with that it also
> provides
> > > > an SQLish interface to the data stored in your warehouse, which
> would be
> > > > very helpful to you, in case you are coming from an SQL background.
> > > >
> > > > HTH
> > > >
> > > >
> > > >
> > > > Regards,
> > > >     Mohammad Tariq
> > > >
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 2:43 PM, bigdata <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > Thanks. I think a real example is better for me to understand your
> > > > > suggestions.
> > > > > Now I have a relational table:ID   LoginTime
> > >  DeviceID1
> > > > >     2012-12-12 12:12:12   abcdef2     2012-12-12  19:12:12
> abcdef3
> > > > >  2012-12-13   10:10:10  defdaf
> > > > > There are several requirements about this table:1. How many device
> > > login
> > > > > in each day?1. For one day, how many new device login? (never login
> > > > > before)1. For one day, how many accumulated device login?
> > > > > How can I design HBase tables to calculate these data?Now my
> solution
> > > > > is:table A:
> > > > > rowkey:  date-deviceidcolumn family: logincolumn qualifier:
>  2012-12-12
> > > > > 12:12:12/2012-12-12 19:12:12....
> > > > > table B:rowkey: deviceidcolumn family:null or anyvalue
> > > > >
> > > > > For req#1, I can scan table A and use prefixfilter(rowkey) to
> check one
> > > > > special date, and get records countFor req#2, I get table b with
> each
+
Kevin Odell 2012-12-13, 14:47
+
Mohammad Tariq 2012-12-13, 15:06
+
Kevin Odell 2012-12-13, 15:30
+
Mohammad Tariq 2012-12-13, 15:33
+
Manoj Babu 2012-12-13, 16:38
+
Kevin Odell 2012-12-13, 16:42
+
Michel Segel 2012-12-14, 00:49
+
Michael Segel 2012-12-13, 20:20
+
Asaf Mesika 2012-12-15, 02:14