Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Memory Consumption and Processing questions


+
Jacques 2010-07-30, 20:15
+
Jonathan Gray 2010-08-01, 18:39
+
Jacques 2010-08-02, 00:29
+
Jonathan Gray 2010-08-02, 04:08
Copy link to this message
-
Re: Memory Consumption and Processing questions
You're right, of course.  I shouldn't generalize too much.  I'm more trying
to understand the landscape than pinpoint anything specific.

Quick question: since the block cache is unaware of the location of files,
wouldn't it overlap the os cache for hfiles once they are localized after
compaction?  Any guidance on how to tune the two?

thanks,
Jacques

On Sun, Aug 1, 2010 at 9:08 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote:

> One reason not to extrapolate that is that leaving lots of memory for the
> linux buffer cache is a good way to improve overall performance of typically
> i/o bound applications like Hadoop and HBase.
>
> Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes
> generally require almost no significant memory (though generally run with
> 1GB); their performance will improve with more free memory for the os buffer
> cache.  As for MR, this completely depends on the tasks running.  The
> TaskTrackers also don't require significant memory, so this completely
> depends on the number of tasks per node and the memory requirements of the
> tasks.
>
> Unfortunately you can't always generalize the requirements too much,
> especially in MR.
>
> JG
>
> > -----Original Message-----
> > From: Jacques [mailto:[EMAIL PROTECTED]]
> > Sent: Sunday, August 01, 2010 5:30 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Memory Consumption and Processing questions
> >
> > Thanks, that was very helpful.
> >
> > Regarding 24gb-- I saw people using servers with 32gb of server memory
> > (a
> > recent thread here and hstack.org).  I extrapolated the use since it
> > seems
> > most people use ~8 for hdfs/mr.
> >
> > -Jacques
> >
> >
> > On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <[EMAIL PROTECTED]>
> > wrote:
> >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jacques [mailto:[EMAIL PROTECTED]]
> > > > Sent: Friday, July 30, 2010 1:16 PM
> > > > To: [EMAIL PROTECTED]
> > > > Subject: Memory Consumption and Processing questions
> > > >
> > > > Hello all,
> > > >
> > > > I'm planning an hbase implementation and had some questions I was
> > > > hoping
> > > > someone could help with.
> > > >
> > > > 1. Can someone give me a basic overview of how memory is used in
> > Hbase?
> > > >  Various places on the web people state that 16-24gb is the minimum
> > for
> > > > region servers if they also operate as hdfs/mr nodes.  Assuming
> > that
> > > > hdfs/mr
> > > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> > > > seems
> > > > like lots of people suggesting use of even 24gb+ for hbase.  Why so
> > > > much?
> > > >  Is it simply to avoid gc problems?  Have data in memory for fast
> > > > random
> > > > reads? Or?
> > >
> > > Where exactly are you reading this from?  I'm not actually aware of
> > people
> > > using 24GB+ heaps for HBase.
> > >
> > > I would not recommend using less than 4GB for RegionServers.  Beyond
> > that,
> > > it very much depends on your application.  8GB is often sufficient
> > but I've
> > > seen as much as 16GB used in production.
> > >
> > > You need at least 4GB because of GC.  General experience has been
> > that
> > > below that the CMS GC does not work well.
> > >
> > > Memory is used primarily for the MemStores (write cache) and Block
> > Cache
> > > (read cache).  In addition, memory is allocated as part of normal
> > operations
> > > to store in-memory state and in processing reads.
> > >
> > > > 2. What types of things put more/less pressure on memory?  I saw
> > > > insinuation
> > > > that insert speed can create substantial memory pressure.  What
> > type of
> > > > relative memory pressure do scanners, random reads, random writes,
> > > > region
> > > > quantity and compactions cause?
> > >
> > > Writes are buffered and flushed to disk when the write buffer gets to
> > a
> > > local or global limit.  The local limit (per region) defaults to
> > 64MB.  The
> > > global limit is based on the total amount of heap available (default,
> > I
> > > think, is 40%).  So there is interplay between how much heap you have
+
Edward Capriolo 2010-08-02, 15:39
+
Jacques 2010-08-02, 23:00
+
Jean-Daniel Cryans 2010-08-02, 23:21
+
Jacques 2010-08-02, 23:41
+
Jean-Daniel Cryans 2010-08-02, 23:44
+
Jacques 2010-08-03, 00:27