Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase and Datawarehouse


+
Kiran 2013-04-28, 03:12
+
Em 2013-04-28, 11:12
+
shashwat shriparv 2013-04-28, 17:00
+
Mohammad Tariq 2013-04-28, 17:27
+
Kiran 2013-04-29, 03:39
+
anil gupta 2013-04-29, 05:21
+
Kiran 2013-04-29, 05:40
+
anil gupta 2013-04-29, 17:00
+
Mohammad Tariq 2013-04-29, 17:35
+
Andrew Purtell 2013-04-29, 18:08
+
Asaf Mesika 2013-04-30, 05:54
+
Andrew Purtell 2013-04-30, 08:07
+
Kevin Odell 2013-04-30, 12:01
+
Andrew Purtell 2013-04-30, 17:38
+
Amandeep Khurana 2013-04-30, 18:19
+
Andrew Purtell 2013-04-30, 18:36
+
Michael Segel 2013-04-30, 18:14
Copy link to this message
-
Re: HBase and Datawarehouse
You wouldn't do that if colocating MR. It is one way to soak up "extra" RAM
on a large RAM box, although I'm not sure I would recommend it (I have no
personal experience trying it, yet). For more on this where people are
actively considering it, see
https://issues.apache.org/jira/browse/BIGTOP-732

On Tue, Apr 30, 2013 at 11:14 AM, Michael Segel
<[EMAIL PROTECTED]>wrote:

> Multiple RS per host?
> Huh?
>
> That seems very counter intuitive and potentially problematic w M/R jobs.
> Could you expand on this?
>
> Thx
>
> -Mike
>
> On Apr 30, 2013, at 12:38 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>
> > Rules of thumb for starting off safely and for easing support issues are
> > really good to have, but there are no hard barriers or singular
> approaches:
> > use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
> > multiple regionservers per host. It is going to depend on how the cluster
> > is used and loaded. If we are talking about coprocessors, then effective
> > limits are less clear, using a coprocessor to integrate an external
> process
> > implemented with native code communicating over memory mapped files in
> > /dev/shm isn't outside what is possible (strawman alert).
> >
> >
> > On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <[EMAIL PROTECTED]
> >wrote:
> >
> >> Asaf,
> >>
> >>  The heap barrier is something of a legend :)  You can ask 10 different
> >> HBase committers what they think the max heap is and get 10 different
> >> answers.  This is my take on heap sizes from the many clusters I have
> dealt
> >> with:
> >>
> >> 8GB -> Standard heap size, and tends to run fine without any tuning
> >>
> >> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
> >> cause churn(usually blockcache)
> >>
> >> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
> >> and ZK timeouts
> >>
> >> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise
> the
> >> ZK timeout a little higher
> >>
> >> 32GB -> We do have a couple people running this high, but the pain out
> >> weighs the gains(IMHO)
> >>
> >> 64GB -> Let me know how it goes :)
> >>
> >>
> >>
> >>
> >> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >>> I don't wish to be rude, but you are making odd claims as fact as
> >>> "mentioned in a couple of posts". It will be difficult to have a
> serious
> >>> conversation. I encourage you to test your hypotheses and let us know
> if
> >> in
> >>> fact there is a JVM "heap barrier" (and where it may be).
> >>>
> >>> On Monday, April 29, 2013, Asaf Mesika wrote:
> >>>
> >>>> I think for Pheoenix truly to succeed, it's need HBase to break the
> JVM
> >>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
> >> of
> >>>> analytics queries utilize memory, thus since its memory is shared with
> >>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
> >>>> Pheonix was implemented outside HBase on the same machine (like Drill
> >> or
> >>>> Impala is doing), you can have 60GB for this process, running many
> OLAP
> >>>> queries in parallel, utilizing the same data set.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[EMAIL PROTECTED]
> >>> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>>>> HBase is not really intended for heavy data crunching
> >>>>>
> >>>>> Yes it is. This is why we have first class MapReduce integration and
> >>>>> optimized scanners.
> >>>>>
> >>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
> >>>> OLAP.
> >>>>>
> >>>>> Urban Airship's Datacube is an example of a successful OLAP project
> >>>>> implemented on HBase: http://github.com/urbanairship/datacube
> >>>>>
> >>>>> "Urban Airship uses the datacube project to support its analytics
> >> stack
> >>>> for
> >>>>> mobile apps. We handle about ~10K events per second per node."
> >>>>>
> >>>>>
> >>>>> Also there is Adobe's SaasBase:
> >>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
+
Michael Segel 2013-04-30, 18:42
+
Michael Segel 2013-04-30, 13:17
+
James Taylor 2013-04-30, 06:28
+
Viral Bajaria 2013-04-30, 06:02
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB