Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Memory Consumption and Processing questions


+
Jacques 2010-07-30, 20:15
+
Jonathan Gray 2010-08-01, 18:39
+
Jacques 2010-08-02, 00:29
+
Jonathan Gray 2010-08-02, 04:08
+
Jacques 2010-08-02, 15:33
+
Edward Capriolo 2010-08-02, 15:39
+
Jacques 2010-08-02, 23:00
+
Jean-Daniel Cryans 2010-08-02, 23:21
Copy link to this message
-
Re: Memory Consumption and Processing questions
Jacques 2010-08-02, 23:41
Wow, with that in mind, it seems like block cache is way more important than
I originally thought (versus os cache).  It also precludes (or reduces)
effective use of things like l2arc ssds on OpenSolaris.  Thanks for pointing
that out.

Your mention of locality reminds me of a question that came up after reading
Lars George's excellent writeup here:
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

<http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html>Upon
cluster restart, is there any "memory" of which region servers last served
which regions or some other method to improve data locality?

I know I could get this answer reviewing the code but I just haven't gotten
to that level of detail yet.

thanks

On Mon, Aug 2, 2010 at 4:21 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> Something to keep in mind is that the block cache is within the region
> server's JVM, whereas it has to go on the network to get data from the
> DNs (which should always be slower even if it's in the OS cache). But,
> on a production system, regions don't move that much so the local DN
> should always contain the blocks for it's RS's regions. If
> https://issues.apache.org/jira/browse/HDFS-347 was there, block
> caching could be almost useless if the OS is given a lot of room and
> there would be no need for IB and whatnot.
>
> J-D
>
> On Mon, Aug 2, 2010 at 4:00 PM, Jacques <[EMAIL PROTECTED]> wrote:
> > Makes me wonder if high speed interconnects and little to no block cache
> > would work better--basically rely on each machine to hold the highly used
> > blocks in os cache and push them around quickly if they are needed
> > elsewhere.  Of course it's all just a thought experiment at this point.
>  The
> > cost of having high speed interconnects would probably be substantially
> more
> > than provisioning extra memory to hold cached blocks twice.  There is
> also
> > the thought that if the blocks are cached by Hbase, they would appear
> rarely
> > used from the os standpoint and are, therefore, unlikely to be in cache.
> >
> >
> >
> >
> > On Mon, Aug 2, 2010 at 8:39 AM, Edward Capriolo <[EMAIL PROTECTED]
> >wrote:
> >
> >> On Mon, Aug 2, 2010 at 11:33 AM, Jacques <[EMAIL PROTECTED]> wrote:
> >> > You're right, of course.  I shouldn't generalize too much.  I'm more
> >> trying
> >> > to understand the landscape than pinpoint anything specific.
> >> >
> >> > Quick question: since the block cache is unaware of the location of
> >> files,
> >> > wouldn't it overlap the os cache for hfiles once they are localized
> after
> >> > compaction?  Any guidance on how to tune the two?
> >> >
> >> > thanks,
> >> > Jacques
> >> >
> >> > On Sun, Aug 1, 2010 at 9:08 PM, Jonathan Gray <[EMAIL PROTECTED]>
> >> wrote:
> >> >
> >> >> One reason not to extrapolate that is that leaving lots of memory for
> >> the
> >> >> linux buffer cache is a good way to improve overall performance of
> >> typically
> >> >> i/o bound applications like Hadoop and HBase.
> >> >>
> >> >> Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes
> >> >> generally require almost no significant memory (though generally run
> >> with
> >> >> 1GB); their performance will improve with more free memory for the os
> >> buffer
> >> >> cache.  As for MR, this completely depends on the tasks running.  The
> >> >> TaskTrackers also don't require significant memory, so this
> completely
> >> >> depends on the number of tasks per node and the memory requirements
> of
> >> the
> >> >> tasks.
> >> >>
> >> >> Unfortunately you can't always generalize the requirements too much,
> >> >> especially in MR.
> >> >>
> >> >> JG
> >> >>
> >> >> > -----Original Message-----
> >> >> > From: Jacques [mailto:[EMAIL PROTECTED]]
> >> >> > Sent: Sunday, August 01, 2010 5:30 PM
> >> >> > To: [EMAIL PROTECTED]
> >> >> > Subject: Re: Memory Consumption and Processing questions
> >> >> >
> >> >> > Thanks, that was very helpful.
> >> >> >
> >> >> > Regarding 24gb-- I saw people using servers with 32gb of server
+
Jean-Daniel Cryans 2010-08-02, 23:44
+
Jacques 2010-08-03, 00:27