Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase and Datawarehouse


Copy link to this message
-
Re: HBase and Datawarehouse
Asaf,

  The heap barrier is something of a legend :)  You can ask 10 different
HBase committers what they think the max heap is and get 10 different
answers.  This is my take on heap sizes from the many clusters I have dealt
with:

8GB -> Standard heap size, and tends to run fine without any tuning

12GB -> Needs some TLC with regards to JVM tuning if your workload tends
cause churn(usually blockcache)

16GB -> GC tuning is a must, and now we need to start looking into MSLab
and ZK timeouts

20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
ZK timeout a little higher

32GB -> We do have a couple people running this high, but the pain out
weighs the gains(IMHO)

64GB -> Let me know how it goes :)
On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> I don't wish to be rude, but you are making odd claims as fact as
> "mentioned in a couple of posts". It will be difficult to have a serious
> conversation. I encourage you to test your hypotheses and let us know if in
> fact there is a JVM "heap barrier" (and where it may be).
>
> On Monday, April 29, 2013, Asaf Mesika wrote:
>
> > I think for Pheoenix truly to succeed, it's need HBase to break the JVM
> > Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
> > analytics queries utilize memory, thus since its memory is shared with
> > HBase, there's so much you can do on 12GB heap. On the other hand, if
> > Pheonix was implemented outside HBase on the same machine (like Drill or
> > Impala is doing), you can have 60GB for this process, running many OLAP
> > queries in parallel, utilizing the same data set.
> >
> >
> >
> > On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> >
> > > > HBase is not really intended for heavy data crunching
> > >
> > > Yes it is. This is why we have first class MapReduce integration and
> > > optimized scanners.
> > >
> > > Recent versions, like 0.94, also do pretty well with the 'O' part of
> > OLAP.
> > >
> > > Urban Airship's Datacube is an example of a successful OLAP project
> > > implemented on HBase: http://github.com/urbanairship/datacube
> > >
> > > "Urban Airship uses the datacube project to support its analytics stack
> > for
> > > mobile apps. We handle about ~10K events per second per node."
> > >
> > >
> > > Also there is Adobe's SaasBase:
> > > http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
> > >
> > > Etc.
> > >
> > > Where an HBase OLAP application will differ tremendously from a
> > traditional
> > > data warehouse is of course in the interface to the datastore. You have
> > to
> > > design and speak in the language of the HBase API, though Phoenix (
> > > https://github.com/forcedotcom/phoenix) is changing that.
> > >
> > >
> > > On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[EMAIL PROTECTED]
> <javascript:;>
> > >
> > > wrote:
> > >
> > > > Hi Kiran,
> > > >
> > > > In HBase the data is denormalized but at the core HBase is KeyValue
> > based
> > > > database meant for lookups or queries that expect response in
> > > milliseconds.
> > > > OLAP i.e. data warehouse usually involves heavy data crunching. HBase
> > is
> > > > not really intended for heavy data crunching. If you want to just
> store
> > > > denoramlized data and do simple queries then HBase is good. For OLAP
> > kind
> > > > of stuff, you can make HBase work but IMO you will be better off
> using
> > > Hive
> > > > for  data warehousing.
> > > >
> > > > HTH,
> > > > Anil Gupta
> > > >
> > > >
> > > > On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> > > >
> > > > > But in HBase data can be said to be in  denormalised state as the
> > > > > methodology
> > > > > used for storage is a (column family:column) based flexible schema
> > > .Also,
> > > > > from Google's  big table paper it is evident that HBase is capable
> of
> > > > doing
> > > > > OLAP.SO where does the difference lie?
> > > > >

Kevin O'Dell
Systems Engineer, Cloudera