Jacques 2010-07-30, 20:15
Jonathan Gray 2010-08-01, 18:39
Thanks, that was very helpful.
Regarding 24gb-- I saw people using servers with 32gb of server memory (a
recent thread here and hstack.org). I extrapolated the use since it seems
most people use ~8 for hdfs/mr.
On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote:
> > -----Original Message-----
> > From: Jacques [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, July 30, 2010 1:16 PM
> > To: [EMAIL PROTECTED]
> > Subject: Memory Consumption and Processing questions
> > Hello all,
> > I'm planning an hbase implementation and had some questions I was
> > hoping
> > someone could help with.
> > 1. Can someone give me a basic overview of how memory is used in Hbase?
> > Various places on the web people state that 16-24gb is the minimum for
> > region servers if they also operate as hdfs/mr nodes. Assuming that
> > hdfs/mr
> > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase. It
> > seems
> > like lots of people suggesting use of even 24gb+ for hbase. Why so
> > much?
> > Is it simply to avoid gc problems? Have data in memory for fast
> > random
> > reads? Or?
> Where exactly are you reading this from? I'm not actually aware of people
> using 24GB+ heaps for HBase.
> I would not recommend using less than 4GB for RegionServers. Beyond that,
> it very much depends on your application. 8GB is often sufficient but I've
> seen as much as 16GB used in production.
> You need at least 4GB because of GC. General experience has been that
> below that the CMS GC does not work well.
> Memory is used primarily for the MemStores (write cache) and Block Cache
> (read cache). In addition, memory is allocated as part of normal operations
> to store in-memory state and in processing reads.
> > 2. What types of things put more/less pressure on memory? I saw
> > insinuation
> > that insert speed can create substantial memory pressure. What type of
> > relative memory pressure do scanners, random reads, random writes,
> > region
> > quantity and compactions cause?
> Writes are buffered and flushed to disk when the write buffer gets to a
> local or global limit. The local limit (per region) defaults to 64MB. The
> global limit is based on the total amount of heap available (default, I
> think, is 40%). So there is interplay between how much heap you have and
> how many regions are actively written to. If you have too many regions and
> not enough memory to allow them to hit the local/region limit, you end up
> flushing undersized files.
> Scanning/random reading will utilize the block cache, if configured to.
> The more room for the block cache, the more data you can keep in-memory.
> Reads from the block cache are significantly faster than non-cached reads,
> Compactions are not generally an issue.
> > 2. How cpu intensive are the region servers? It seems like most of
> > their
> > performance is based on i/o. (I've noted the caution in starving
> > region
> > servers of cycles--which seems primarily focused on avoiding zk timeout
> > >
> > region reassignment problems.) Does anyone suggest or recommend
> > against
> > dedicating only one or two cores to a region server? Do individual
> > compactions benefit from multiple cores are they single-threaded?
> I would dedicate at least one core to a region server, but as we add more
> and more concurrency, it may become important to have two cores available.
> Many things, like compactions, are only single threaded today but there's a
> very good chance you will be able to configure multiple threads in the next
> major release.
> > 3. What are the memory and cpu resource demands of the master server?
> > It
> > seems like more and more of that load is moving to zk.
> Not too much. I'm putting a change in TRUNK right now that keeps all
> region assignments in the master, so there is some memory usage, but not
> much. I would think 2GB heap and 1-2 cores is sufficient.
> > 4. General HDFS question-- when the namenode dies, what happens to the
Jonathan Gray 2010-08-02, 04:08
Jacques 2010-08-02, 15:33
Edward Capriolo 2010-08-02, 15:39
Jacques 2010-08-02, 23:00
Jean-Daniel Cryans 2010-08-02, 23:21
Jacques 2010-08-02, 23:41
Jean-Daniel Cryans 2010-08-02, 23:44
Jacques 2010-08-03, 00:27