Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - HBase and Datawarehouse


+
Kiran 2013-04-28, 03:12
+
Em 2013-04-28, 11:12
+
shashwat shriparv 2013-04-28, 17:00
+
Mohammad Tariq 2013-04-28, 17:27
+
Kiran 2013-04-29, 03:39
+
anil gupta 2013-04-29, 05:21
+
Kiran 2013-04-29, 05:40
+
anil gupta 2013-04-29, 17:00
+
Mohammad Tariq 2013-04-29, 17:35
+
Andrew Purtell 2013-04-29, 18:08
+
Asaf Mesika 2013-04-30, 05:54
+
Andrew Purtell 2013-04-30, 08:07
+
Kevin Odell 2013-04-30, 12:01
+
Andrew Purtell 2013-04-30, 17:38
+
Amandeep Khurana 2013-04-30, 18:19
+
Andrew Purtell 2013-04-30, 18:36
+
Michael Segel 2013-04-30, 18:14
+
Andrew Purtell 2013-04-30, 18:30
Copy link to this message
-
Re: HBase and Datawarehouse
Michael Segel 2013-04-30, 18:42
Hmmm

I don't recommend HBase in situations where you are not running a M/R Framework. Sorry, as much as I love HBase, IMHO there are probably better solutions for a standalone NoSQL Databases. (YMMV depending on your use case.)
The strength of HBase is that its part of the Hadoop Ecosystem.

I would think that it would probably be better to go virtual than go multi-region servers on bare hardware.  You take a hit on I/O, but you can work around that too.

But I'm conservative unless I have to get creative. ;-)

But something to consider when white boarding ideas...

On Apr 30, 2013, at 1:30 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> You wouldn't do that if colocating MR. It is one way to soak up "extra" RAM
> on a large RAM box, although I'm not sure I would recommend it (I have no
> personal experience trying it, yet). For more on this where people are
> actively considering it, see
> https://issues.apache.org/jira/browse/BIGTOP-732
>
> On Tue, Apr 30, 2013 at 11:14 AM, Michael Segel
> <[EMAIL PROTECTED]>wrote:
>
>> Multiple RS per host?
>> Huh?
>>
>> That seems very counter intuitive and potentially problematic w M/R jobs.
>> Could you expand on this?
>>
>> Thx
>>
>> -Mike
>>
>> On Apr 30, 2013, at 12:38 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>>
>>> Rules of thumb for starting off safely and for easing support issues are
>>> really good to have, but there are no hard barriers or singular
>> approaches:
>>> use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
>>> multiple regionservers per host. It is going to depend on how the cluster
>>> is used and loaded. If we are talking about coprocessors, then effective
>>> limits are less clear, using a coprocessor to integrate an external
>> process
>>> implemented with native code communicating over memory mapped files in
>>> /dev/shm isn't outside what is possible (strawman alert).
>>>
>>>
>>> On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> Asaf,
>>>>
>>>> The heap barrier is something of a legend :)  You can ask 10 different
>>>> HBase committers what they think the max heap is and get 10 different
>>>> answers.  This is my take on heap sizes from the many clusters I have
>> dealt
>>>> with:
>>>>
>>>> 8GB -> Standard heap size, and tends to run fine without any tuning
>>>>
>>>> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
>>>> cause churn(usually blockcache)
>>>>
>>>> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
>>>> and ZK timeouts
>>>>
>>>> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise
>> the
>>>> ZK timeout a little higher
>>>>
>>>> 32GB -> We do have a couple people running this high, but the pain out
>>>> weighs the gains(IMHO)
>>>>
>>>> 64GB -> Let me know how it goes :)
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>> I don't wish to be rude, but you are making odd claims as fact as
>>>>> "mentioned in a couple of posts". It will be difficult to have a
>> serious
>>>>> conversation. I encourage you to test your hypotheses and let us know
>> if
>>>> in
>>>>> fact there is a JVM "heap barrier" (and where it may be).
>>>>>
>>>>> On Monday, April 29, 2013, Asaf Mesika wrote:
>>>>>
>>>>>> I think for Pheoenix truly to succeed, it's need HBase to break the
>> JVM
>>>>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
>>>> of
>>>>>> analytics queries utilize memory, thus since its memory is shared with
>>>>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
>>>>>> Pheonix was implemented outside HBase on the same machine (like Drill
>>>> or
>>>>>> Impala is doing), you can have 60GB for this process, running many
>> OLAP
>>>>>> queries in parallel, utilizing the same data set.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[EMAIL PROTECTED]
+
Michael Segel 2013-04-30, 13:17
+
James Taylor 2013-04-30, 06:28
+
Viral Bajaria 2013-04-30, 06:02