Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase and Datawarehouse


Copy link to this message
-
Re: HBase and Datawarehouse
Tell me why your RS needs to be that large?  (> 8 GB. )

I think the answer is that it depends. Especially when you start to add in coprocessors.
I'm not saying that there are not legitimate reasons, but that a lot of time, people just up the heap size without thinking about the problem.
To Kevin's point, when you exceed a certain point, you're going to need to really start to think about the tuning process.

MSLABs is now on by default or so I am told.

-Just because you can do something doesn't mean its a good idea. ;-)

On Apr 30, 2013, at 7:01 AM, Kevin O'dell <[EMAIL PROTECTED]> wrote:

> Asaf,
>
>  The heap barrier is something of a legend :)  You can ask 10 different
> HBase committers what they think the max heap is and get 10 different
> answers.  This is my take on heap sizes from the many clusters I have dealt
> with:
>
> 8GB -> Standard heap size, and tends to run fine without any tuning
>
> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
> cause churn(usually blockcache)
>
> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
> and ZK timeouts
>
> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
> ZK timeout a little higher
>
> 32GB -> We do have a couple people running this high, but the pain out
> weighs the gains(IMHO)
>
> 64GB -> Let me know how it goes :)
>
>
>
>
> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>
>> I don't wish to be rude, but you are making odd claims as fact as
>> "mentioned in a couple of posts". It will be difficult to have a serious
>> conversation. I encourage you to test your hypotheses and let us know if in
>> fact there is a JVM "heap barrier" (and where it may be).
>>
>> On Monday, April 29, 2013, Asaf Mesika wrote:
>>
>>> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
>>> analytics queries utilize memory, thus since its memory is shared with
>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
>>> Pheonix was implemented outside HBase on the same machine (like Drill or
>>> Impala is doing), you can have 60GB for this process, running many OLAP
>>> queries in parallel, utilizing the same data set.
>>>
>>>
>>>
>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[EMAIL PROTECTED]
>> <javascript:;>>
>>> wrote:
>>>
>>>>> HBase is not really intended for heavy data crunching
>>>>
>>>> Yes it is. This is why we have first class MapReduce integration and
>>>> optimized scanners.
>>>>
>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
>>> OLAP.
>>>>
>>>> Urban Airship's Datacube is an example of a successful OLAP project
>>>> implemented on HBase: http://github.com/urbanairship/datacube
>>>>
>>>> "Urban Airship uses the datacube project to support its analytics stack
>>> for
>>>> mobile apps. We handle about ~10K events per second per node."
>>>>
>>>>
>>>> Also there is Adobe's SaasBase:
>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>>>
>>>> Etc.
>>>>
>>>> Where an HBase OLAP application will differ tremendously from a
>>> traditional
>>>> data warehouse is of course in the interface to the datastore. You have
>>> to
>>>> design and speak in the language of the HBase API, though Phoenix (
>>>> https://github.com/forcedotcom/phoenix) is changing that.
>>>>
>>>>
>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[EMAIL PROTECTED]
>> <javascript:;>
>>>>
>>>> wrote:
>>>>
>>>>> Hi Kiran,
>>>>>
>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
>>> based
>>>>> database meant for lookups or queries that expect response in
>>>> milliseconds.
>>>>> OLAP i.e. data warehouse usually involves heavy data crunching. HBase
>>> is
>>>>> not really intended for heavy data crunching. If you want to just
>> store
>>>>> denoramlized data and do simple queries then HBase is good. For OLAP
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB