Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - How to design a data warehouse in HBase?


+
bigdata 2012-12-13, 05:57
+
lars hofhansl 2012-12-13, 07:09
+
Michel Segel 2012-12-13, 08:43
+
bigdata 2012-12-13, 09:13
+
Mohammad Tariq 2012-12-13, 09:42
+
bigdata 2012-12-13, 09:47
+
Mohammad Tariq 2012-12-13, 10:13
+
bigdata 2012-12-13, 14:28
+
Mohammad Tariq 2012-12-13, 14:44
+
Kevin Odell 2012-12-13, 14:47
+
Mohammad Tariq 2012-12-13, 15:06
+
Kevin Odell 2012-12-13, 15:30
+
Mohammad Tariq 2012-12-13, 15:33
+
Manoj Babu 2012-12-13, 16:38
+
Kevin Odell 2012-12-13, 16:42
Copy link to this message
-
Re: How to design a data warehouse in HBase?
Michel Segel 2012-12-14, 00:49
I don't know that I would recommend Impala at this stage in its development.
Sorry, it has a bit of growing up.

It's interesting, but no UDFs, right?

Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 13, 2012, at 4:42 PM, "Kevin O'dell" <[EMAIL PROTECTED]> wrote:

> Correct, Impala relies on the Hive Metastore.
>
> On Thu, Dec 13, 2012 at 11:38 AM, Manoj Babu <[EMAIL PROTECTED]> wrote:
>
>> Kevin,
>>
>> Impala requires Hive right?
>> so to get the advantages of Impala do we need to go with Hive?
>>
>>
>> Cheers!
>> Manoj.
>>
>>
>>
>> On Thu, Dec 13, 2012 at 9:03 PM, Mohammad Tariq <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Thank you so much for the clarification Kevin.
>>>
>>> Regards,
>>>    Mohammad Tariq
>>>
>>>
>>>
>>> On Thu, Dec 13, 2012 at 9:00 PM, Kevin O'dell <[EMAIL PROTECTED]
>>>> wrote:
>>>
>>>> Mohammad,
>>>>
>>>>  I am not sure you are thinking about Impala correctly.  It still uses
>>>> HDFS so your data increasing over time is fine.  You are not going to
>>> need
>>>> to tune for special CPU, Storage, or Network.  Typically with Impala
>> you
>>>> are going to be bound at the disks as it functions off of data
>> locality.
>>>> You can also use compression of Snappy, GZip, and BZip to help with
>> the
>>>> amount of data you are storing.  You will not need to frequently update
>>>> your hardware.
>>>>
>>>> On Thu, Dec 13, 2012 at 10:06 AM, Mohammad Tariq <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>> Oh yes..Impala..good point by Kevin.
>>>>>
>>>>> Kevin : Would it be appropriate if I say that I should go for Impala
>> if
>>>> my
>>>>> data is not going to increase dramatically over time or if I have to
>>> work
>>>>> on only a subset of my BigData?Since Impala uses MPP, it may
>>>>> require specialized hardware tuned for CPU, storage and network
>>>> performance
>>>>> for better results, which could become a problem if have to upgrade
>> the
>>>>> hardware frequently because of the growing data.
>>>>>
>>>>> Regards,
>>>>>    Mohammad Tariq
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 13, 2012 at 8:17 PM, Kevin O'dell <
>>> [EMAIL PROTECTED]
>>>>>> wrote:
>>>>>
>>>>>> To Mohammad's point.  You can use HBase for quick scans of the
>> data.
>>>>> Hive
>>>>>> for your longer running jobs.  Impala over the two for quick adhoc
>>>>>> searches.
>>>>>>
>>>>>> On Thu, Dec 13, 2012 at 9:44 AM, Mohammad Tariq <
>> [EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>>
>>>>>>> I am not saying Hbase is not good. My point was to consider Hive
>> as
>>>>> well.
>>>>>>> Think about the approach keeping both the tools in mind and
>>> decide. I
>>>>>> just
>>>>>>> provided an option keeping in mind the available built-in Hive
>>>>> features.
>>>>>> I
>>>>>>> would like to add one more point here, you can map your Hbase
>>> tables
>>>> to
>>>>>>> Hive.
>>>>>>>
>>>>>>> Regards,
>>>>>>>    Mohammad Tariq
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 13, 2012 at 7:58 PM, bigdata <
>> [EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi, Tariq
>>>>>>>> Thanks for your feedback. Actually, now we have two ways to
>> reach
>>>> the
>>>>>>>> target, by Hive and  by HBase.Could you tell me why HBase is
>> not
>>>> good
>>>>>> for
>>>>>>>> my requirements?Or what's the problem in my solution?
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>> From: [EMAIL PROTECTED]
>>>>>>>>> Date: Thu, 13 Dec 2012 15:43:25 +0530
>>>>>>>>> Subject: Re: How to design a data warehouse in HBase?
>>>>>>>>> To: [EMAIL PROTECTED]
>>>>>>>>>
>>>>>>>>> Both have got different purposes. Normally people say that
>> Hive
>>>> is
>>>>>>> slow,
>>>>>>>>> that's just because it uses MapReduce under the hood. And i'm
>>>> sure
>>>>>> that
>>>>>>>> if
>>>>>>>>> the data stored in HBase is very huge, nobody would write
>>>>> sequential
>>>>>>>>> programs for Get or Scan. Instead they will write MP jobs or
>> do
>>>>>>> something
>>>>>>>>> similar.
>>>>>>>>>
>>>>>>>>> My point is that nothing can be 100% real time. Is that what
>>> you
+
Michael Segel 2012-12-13, 20:20
+
Asaf Mesika 2012-12-15, 02:14