Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How to design a data warehouse in HBase?


+
bigdata 2012-12-13, 05:57
+
lars hofhansl 2012-12-13, 07:09
+
Michel Segel 2012-12-13, 08:43
+
bigdata 2012-12-13, 09:13
+
Mohammad Tariq 2012-12-13, 09:42
+
bigdata 2012-12-13, 09:47
+
Mohammad Tariq 2012-12-13, 10:13
+
bigdata 2012-12-13, 14:28
+
Mohammad Tariq 2012-12-13, 14:44
+
Kevin Odell 2012-12-13, 14:47
+
Mohammad Tariq 2012-12-13, 15:06
+
Kevin Odell 2012-12-13, 15:30
+
Mohammad Tariq 2012-12-13, 15:33
+
Manoj Babu 2012-12-13, 16:38
Copy link to this message
-
Re: How to design a data warehouse in HBase?
Correct, Impala relies on the Hive Metastore.

On Thu, Dec 13, 2012 at 11:38 AM, Manoj Babu <[EMAIL PROTECTED]> wrote:

> Kevin,
>
> Impala requires Hive right?
> so to get the advantages of Impala do we need to go with Hive?
>
>
> Cheers!
> Manoj.
>
>
>
> On Thu, Dec 13, 2012 at 9:03 PM, Mohammad Tariq <[EMAIL PROTECTED]>
> wrote:
>
> > Thank you so much for the clarification Kevin.
> >
> > Regards,
> >     Mohammad Tariq
> >
> >
> >
> > On Thu, Dec 13, 2012 at 9:00 PM, Kevin O'dell <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Mohammad,
> > >
> > >   I am not sure you are thinking about Impala correctly.  It still uses
> > > HDFS so your data increasing over time is fine.  You are not going to
> > need
> > > to tune for special CPU, Storage, or Network.  Typically with Impala
> you
> > > are going to be bound at the disks as it functions off of data
> locality.
> > >  You can also use compression of Snappy, GZip, and BZip to help with
> the
> > > amount of data you are storing.  You will not need to frequently update
> > > your hardware.
> > >
> > > On Thu, Dec 13, 2012 at 10:06 AM, Mohammad Tariq <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Oh yes..Impala..good point by Kevin.
> > > >
> > > > Kevin : Would it be appropriate if I say that I should go for Impala
> if
> > > my
> > > > data is not going to increase dramatically over time or if I have to
> > work
> > > > on only a subset of my BigData?Since Impala uses MPP, it may
> > > > require specialized hardware tuned for CPU, storage and network
> > > performance
> > > > for better results, which could become a problem if have to upgrade
> the
> > > > hardware frequently because of the growing data.
> > > >
> > > > Regards,
> > > >     Mohammad Tariq
> > > >
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 8:17 PM, Kevin O'dell <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > To Mohammad's point.  You can use HBase for quick scans of the
> data.
> > > >  Hive
> > > > > for your longer running jobs.  Impala over the two for quick adhoc
> > > > > searches.
> > > > >
> > > > > On Thu, Dec 13, 2012 at 9:44 AM, Mohammad Tariq <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > I am not saying Hbase is not good. My point was to consider Hive
> as
> > > > well.
> > > > > > Think about the approach keeping both the tools in mind and
> > decide. I
> > > > > just
> > > > > > provided an option keeping in mind the available built-in Hive
> > > > features.
> > > > > I
> > > > > > would like to add one more point here, you can map your Hbase
> > tables
> > > to
> > > > > > Hive.
> > > > > >
> > > > > > Regards,
> > > > > >     Mohammad Tariq
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 13, 2012 at 7:58 PM, bigdata <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > > >
> > > > > > > Hi, Tariq
> > > > > > > Thanks for your feedback. Actually, now we have two ways to
> reach
> > > the
> > > > > > > target, by Hive and  by HBase.Could you tell me why HBase is
> not
> > > good
> > > > > for
> > > > > > > my requirements?Or what's the problem in my solution?
> > > > > > > Thanks.
> > > > > > >
> > > > > > > > From: [EMAIL PROTECTED]
> > > > > > > > Date: Thu, 13 Dec 2012 15:43:25 +0530
> > > > > > > > Subject: Re: How to design a data warehouse in HBase?
> > > > > > > > To: [EMAIL PROTECTED]
> > > > > > > >
> > > > > > > > Both have got different purposes. Normally people say that
> Hive
> > > is
> > > > > > slow,
> > > > > > > > that's just because it uses MapReduce under the hood. And i'm
> > > sure
> > > > > that
> > > > > > > if
> > > > > > > > the data stored in HBase is very huge, nobody would write
> > > > sequential
> > > > > > > > programs for Get or Scan. Instead they will write MP jobs or
> do
> > > > > > something
> > > > > > > > similar.
> > > > > > > >
> > > > > > > > My point is that nothing can be 100% real time. Is that what
> > you
> > > > > > want?If
> > > > > > > > that is the case I would never suggest Hadoop on the first

Kevin O'Dell
Customer Operations Engineer, Cloudera
+
Michel Segel 2012-12-14, 00:49
+
Michael Segel 2012-12-13, 20:20
+
Asaf Mesika 2012-12-15, 02:14