Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - How to design a data warehouse in HBase?


Copy link to this message
-
Re: How to design a data warehouse in HBase?
Kevin O'dell 2012-12-13, 14:47
To Mohammad's point.  You can use HBase for quick scans of the data.  Hive
for your longer running jobs.  Impala over the two for quick adhoc searches.

On Thu, Dec 13, 2012 at 9:44 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> I am not saying Hbase is not good. My point was to consider Hive as well.
> Think about the approach keeping both the tools in mind and decide. I just
> provided an option keeping in mind the available built-in Hive features. I
> would like to add one more point here, you can map your Hbase tables to
> Hive.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 7:58 PM, bigdata <[EMAIL PROTECTED]> wrote:
>
> > Hi, Tariq
> > Thanks for your feedback. Actually, now we have two ways to reach the
> > target, by Hive and  by HBase.Could you tell me why HBase is not good for
> > my requirements?Or what's the problem in my solution?
> > Thanks.
> >
> > > From: [EMAIL PROTECTED]
> > > Date: Thu, 13 Dec 2012 15:43:25 +0530
> > > Subject: Re: How to design a data warehouse in HBase?
> > > To: [EMAIL PROTECTED]
> > >
> > > Both have got different purposes. Normally people say that Hive is
> slow,
> > > that's just because it uses MapReduce under the hood. And i'm sure that
> > if
> > > the data stored in HBase is very huge, nobody would write sequential
> > > programs for Get or Scan. Instead they will write MP jobs or do
> something
> > > similar.
> > >
> > > My point is that nothing can be 100% real time. Is that what you
> want?If
> > > that is the case I would never suggest Hadoop on the first place as
> it's
> > a
> > > batch processing system and cannot be used like an OLTP system, unless
> > you
> > > have thought of some additional stuff. Since you are talking about
> > > warehouse, I am assuming you are going to store and process gigantic
> > > amounts of data. That's the only reason I had suggested Hive.
> > >
> > > The whole point is that everything is not a solution for everything.
> One
> > > size doesn't fit all. First, we need to analyze our particular use
> case.
> > > The person, who says Hive is slow, might be correct. But only for his
> > > scenario.
> > >
> > > HTH
> > >
> > > Regards,
> > >     Mohammad Tariq
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 3:17 PM, bigdata <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hi,
> > > > I've got the information that HIVE 's performance is too low. It
> access
> > > > HDFS files and scan all data to search one record. IS it TRUE? And
> > HBase is
> > > > much faster than it.
> > > >
> > > >
> > > > > From: [EMAIL PROTECTED]
> > > > > Date: Thu, 13 Dec 2012 15:12:25 +0530
> > > > > Subject: Re: How to design a data warehouse in HBase?
> > > > > To: [EMAIL PROTECTED]
> > > > >
> > > > > Hi there,
> > > > >
> > > > >    If you are really planning for a warehousing solution then I
> would
> > > > > suggest you to have a look over Apache Hive. It provides you
> > warehousing
> > > > > capabilities on top of a Hadoop cluster. Along with that it also
> > provides
> > > > > an SQLish interface to the data stored in your warehouse, which
> > would be
> > > > > very helpful to you, in case you are coming from an SQL background.
> > > > >
> > > > > HTH
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >     Mohammad Tariq
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 13, 2012 at 2:43 PM, bigdata <[EMAIL PROTECTED]>
> > > > wrote:
> > > > >
> > > > > > Thanks. I think a real example is better for me to understand
> your
> > > > > > suggestions.
> > > > > > Now I have a relational table:ID   LoginTime
> > > >  DeviceID1
> > > > > >     2012-12-12 12:12:12   abcdef2     2012-12-12 19:12:12
> > abcdef3
> > > > > >  2012-12-13 10:10:10  defdaf
> > > > > > There are several requirements about this table:1. How many
> device
> > > > login
> > > > > > in each day?1. For one day, how many new device login? (never
> login
> > > > > > before)1. For one day, how many accumulated device login?
> > > > > > How can I design HBase tables to calculate these data?Now my

Kevin O'Dell
Customer Operations Engineer, Cloudera