Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How to design a data warehouse in HBase?


+
bigdata 2012-12-13, 05:57
+
lars hofhansl 2012-12-13, 07:09
+
Michel Segel 2012-12-13, 08:43
+
bigdata 2012-12-13, 09:13
+
Mohammad Tariq 2012-12-13, 09:42
+
bigdata 2012-12-13, 09:47
+
Mohammad Tariq 2012-12-13, 10:13
+
bigdata 2012-12-13, 14:28
+
Mohammad Tariq 2012-12-13, 14:44
+
Kevin Odell 2012-12-13, 14:47
+
Mohammad Tariq 2012-12-13, 15:06
+
Kevin Odell 2012-12-13, 15:30
+
Mohammad Tariq 2012-12-13, 15:33
Copy link to this message
-
Re: How to design a data warehouse in HBase?
Kevin,

Impala requires Hive right?
so to get the advantages of Impala do we need to go with Hive?
Cheers!
Manoj.

On Thu, Dec 13, 2012 at 9:03 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Thank you so much for the clarification Kevin.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 9:00 PM, Kevin O'dell <[EMAIL PROTECTED]
> >wrote:
>
> > Mohammad,
> >
> >   I am not sure you are thinking about Impala correctly.  It still uses
> > HDFS so your data increasing over time is fine.  You are not going to
> need
> > to tune for special CPU, Storage, or Network.  Typically with Impala you
> > are going to be bound at the disks as it functions off of data locality.
> >  You can also use compression of Snappy, GZip, and BZip to help with the
> > amount of data you are storing.  You will not need to frequently update
> > your hardware.
> >
> > On Thu, Dec 13, 2012 at 10:06 AM, Mohammad Tariq <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Oh yes..Impala..good point by Kevin.
> > >
> > > Kevin : Would it be appropriate if I say that I should go for Impala if
> > my
> > > data is not going to increase dramatically over time or if I have to
> work
> > > on only a subset of my BigData?Since Impala uses MPP, it may
> > > require specialized hardware tuned for CPU, storage and network
> > performance
> > > for better results, which could become a problem if have to upgrade the
> > > hardware frequently because of the growing data.
> > >
> > > Regards,
> > >     Mohammad Tariq
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 8:17 PM, Kevin O'dell <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > To Mohammad's point.  You can use HBase for quick scans of the data.
> > >  Hive
> > > > for your longer running jobs.  Impala over the two for quick adhoc
> > > > searches.
> > > >
> > > > On Thu, Dec 13, 2012 at 9:44 AM, Mohammad Tariq <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > > I am not saying Hbase is not good. My point was to consider Hive as
> > > well.
> > > > > Think about the approach keeping both the tools in mind and
> decide. I
> > > > just
> > > > > provided an option keeping in mind the available built-in Hive
> > > features.
> > > > I
> > > > > would like to add one more point here, you can map your Hbase
> tables
> > to
> > > > > Hive.
> > > > >
> > > > > Regards,
> > > > >     Mohammad Tariq
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 13, 2012 at 7:58 PM, bigdata <[EMAIL PROTECTED]>
> > > > wrote:
> > > > >
> > > > > > Hi, Tariq
> > > > > > Thanks for your feedback. Actually, now we have two ways to reach
> > the
> > > > > > target, by Hive and  by HBase.Could you tell me why HBase is not
> > good
> > > > for
> > > > > > my requirements?Or what's the problem in my solution?
> > > > > > Thanks.
> > > > > >
> > > > > > > From: [EMAIL PROTECTED]
> > > > > > > Date: Thu, 13 Dec 2012 15:43:25 +0530
> > > > > > > Subject: Re: How to design a data warehouse in HBase?
> > > > > > > To: [EMAIL PROTECTED]
> > > > > > >
> > > > > > > Both have got different purposes. Normally people say that Hive
> > is
> > > > > slow,
> > > > > > > that's just because it uses MapReduce under the hood. And i'm
> > sure
> > > > that
> > > > > > if
> > > > > > > the data stored in HBase is very huge, nobody would write
> > > sequential
> > > > > > > programs for Get or Scan. Instead they will write MP jobs or do
> > > > > something
> > > > > > > similar.
> > > > > > >
> > > > > > > My point is that nothing can be 100% real time. Is that what
> you
> > > > > want?If
> > > > > > > that is the case I would never suggest Hadoop on the first
> place
> > as
> > > > > it's
> > > > > > a
> > > > > > > batch processing system and cannot be used like an OLTP system,
> > > > unless
> > > > > > you
> > > > > > > have thought of some additional stuff. Since you are talking
> > about
> > > > > > > warehouse, I am assuming you are going to store and process
> > > gigantic
> > > > > > > amounts of data. That's the only reason I had suggested Hive.
+
Kevin Odell 2012-12-13, 16:42
+
Michel Segel 2012-12-14, 00:49
+
Michael Segel 2012-12-13, 20:20
+
Asaf Mesika 2012-12-15, 02:14