Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Memory Consumption and Processing questions


+
Jacques 2010-07-30, 20:15
+
Jonathan Gray 2010-08-01, 18:39
Copy link to this message
-
Re: Memory Consumption and Processing questions
Thanks, that was very helpful.

Regarding 24gb-- I saw people using servers with 32gb of server memory (a
recent thread here and hstack.org).  I extrapolated the use since it seems
most people use ~8 for hdfs/mr.

-Jacques
On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote:

>
>
> > -----Original Message-----
> > From: Jacques [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, July 30, 2010 1:16 PM
> > To: [EMAIL PROTECTED]
> > Subject: Memory Consumption and Processing questions
> >
> > Hello all,
> >
> > I'm planning an hbase implementation and had some questions I was
> > hoping
> > someone could help with.
> >
> > 1. Can someone give me a basic overview of how memory is used in Hbase?
> >  Various places on the web people state that 16-24gb is the minimum for
> > region servers if they also operate as hdfs/mr nodes.  Assuming that
> > hdfs/mr
> > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> > seems
> > like lots of people suggesting use of even 24gb+ for hbase.  Why so
> > much?
> >  Is it simply to avoid gc problems?  Have data in memory for fast
> > random
> > reads? Or?
>
> Where exactly are you reading this from?  I'm not actually aware of people
> using 24GB+ heaps for HBase.
>
> I would not recommend using less than 4GB for RegionServers.  Beyond that,
> it very much depends on your application.  8GB is often sufficient but I've
> seen as much as 16GB used in production.
>
> You need at least 4GB because of GC.  General experience has been that
> below that the CMS GC does not work well.
>
> Memory is used primarily for the MemStores (write cache) and Block Cache
> (read cache).  In addition, memory is allocated as part of normal operations
> to store in-memory state and in processing reads.
>
> > 2. What types of things put more/less pressure on memory?  I saw
> > insinuation
> > that insert speed can create substantial memory pressure.  What type of
> > relative memory pressure do scanners, random reads, random writes,
> > region
> > quantity and compactions cause?
>
> Writes are buffered and flushed to disk when the write buffer gets to a
> local or global limit.  The local limit (per region) defaults to 64MB.  The
> global limit is based on the total amount of heap available (default, I
> think, is 40%).  So there is interplay between how much heap you have and
> how many regions are actively written to.  If you have too many regions and
> not enough memory to allow them to hit the local/region limit, you end up
> flushing undersized files.
>
> Scanning/random reading will utilize the block cache, if configured to.
>  The more room for the block cache, the more data you can keep in-memory.
>  Reads from the block cache are significantly faster than non-cached reads,
> obviously.
>
> Compactions are not generally an issue.
>
> > 2. How cpu intensive are the region servers?  It seems like most of
> > their
> > performance is based on i/o.  (I've noted the caution in starving
> > region
> > servers of cycles--which seems primarily focused on avoiding zk timeout
> > >
> > region reassignment problems.)  Does anyone suggest or recommend
> > against
> > dedicating only one or two cores to a region server?  Do individual
> > compactions benefit from multiple cores are they single-threaded?
>
> I would dedicate at least one core to a region server, but as we add more
> and more concurrency, it may become important to have two cores available.
>  Many things, like compactions, are only single threaded today but there's a
> very good chance you will be able to configure multiple threads in the next
> major release.
>
> > 3. What are the memory and cpu resource demands of the master server?
> > It
> > seems like more and more of that load is moving to zk.
>
> Not too much.  I'm putting a change in TRUNK right now that keeps all
> region assignments in the master, so there is some memory usage, but not
> much.  I would think 2GB heap and 1-2 cores is sufficient.
>
> > 4. General HDFS question-- when the namenode dies, what happens to the
+
Jonathan Gray 2010-08-02, 04:08
+
Jacques 2010-08-02, 15:33
+
Edward Capriolo 2010-08-02, 15:39
+
Jacques 2010-08-02, 23:00
+
Jean-Daniel Cryans 2010-08-02, 23:21
+
Jacques 2010-08-02, 23:41
+
Jean-Daniel Cryans 2010-08-02, 23:44
+
Jacques 2010-08-03, 00:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB