HBase, mail # user - Memory Consumption and Processing questions

Re: Memory Consumption and Processing questions
Jacques 2010-08-02, 00:29
Thanks, that was very helpful.

Regarding 24gb-- I saw people using servers with 32gb of server memory (a
recent thread here and hstack.org).  I extrapolated the use since it seems
most people use ~8 for hdfs/mr.

> > Hello all,
> >
> > I'm planning an hbase implementation and had some questions I was
> > hoping
> > someone could help with.
> >
> > 1. Can someone give me a basic overview of how memory is used in Hbase?
> >  Various places on the web people state that 16-24gb is the minimum for
> > region servers if they also operate as hdfs/mr nodes.  Assuming that
> > hdfs/mr
> > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> > seems
> > like lots of people suggesting use of even 24gb+ for hbase.  Why so
> > much?
> >  Is it simply to avoid gc problems?  Have data in memory for fast
> > random
> > reads? Or?
> Where exactly are you reading this from?  I'm not actually aware of people
> using 24GB+ heaps for HBase.
> I would not recommend using less than 4GB for RegionServers.  Beyond that,
> it very much depends on your application.  8GB is often sufficient but I've
> seen as much as 16GB used in production.
> You need at least 4GB because of GC.  General experience has been that
> below that the CMS GC does not work well.
> Memory is used primarily for the MemStores (write cache) and Block Cache
> (read cache).  In addition, memory is allocated as part of normal operations
> to store in-memory state and in processing reads.
> > 2. What types of things put more/less pressure on memory?  I saw
> > insinuation
> > that insert speed can create substantial memory pressure.  What type of
> > relative memory pressure do scanners, random reads, random writes,
> > region
> > quantity and compactions cause?
> Writes are buffered and flushed to disk when the write buffer gets to a
> local or global limit.  The local limit (per region) defaults to 64MB.  The
> global limit is based on the total amount of heap available (default, I
> think, is 40%).  So there is interplay between how much heap you have and
> how many regions are actively written to.  If you have too many regions and
> not enough memory to allow them to hit the local/region limit, you end up
> flushing undersized files.
> Scanning/random reading will utilize the block cache, if configured to.
>  The more room for the block cache, the more data you can keep in-memory.
>  Reads from the block cache are significantly faster than non-cached reads,
> obviously.
> Compactions are not generally an issue.
> > 2. How cpu intensive are the region servers?  It seems like most of
> > their
> > performance is based on i/o.  (I've noted the caution in starving
> > region
> > servers of cycles--which seems primarily focused on avoiding zk timeout
> > >
> > region reassignment problems.)  Does anyone suggest or recommend
> > against
> > dedicating only one or two cores to a region server?  Do individual
> > compactions benefit from multiple cores are they single-threaded?
> I would dedicate at least one core to a region server, but as we add more
> and more concurrency, it may become important to have two cores available.
>  Many things, like compactions, are only single threaded today but there's a
> very good chance you will be able to configure multiple threads in the next
> major release.
> > 3. What are the memory and cpu resource demands of the master server?
> > It
> > seems like more and more of that load is moving to zk.
> Not too much.  I'm putting a change in TRUNK right now that keeps all
> region assignments in the master, so there is some memory usage, but not
> much.  I would think 2GB heap and 1-2 cores is sufficient.
> > 4. General HDFS question-- when the namenode dies, what happens to the