Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Does HBase RegionServer benefit from OS Page Cache


Copy link to this message
-
Re: Does HBase RegionServer benefit from OS Page Cache
Coming up is the following enhancement which would make MSLAB even better:

HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using MSLAB

FYI

On Sat, Mar 23, 2013 at 5:31 PM, Pankaj Gupta <[EMAIL PROTECTED]>wrote:

> Thanks a lot for the explanation. It's good to know that MSlab is stable
> and safe to enable (we don't have it enable right now, we're using 0.92).
> This would allow us to more freely allocate memory to HBase. I really
> enjoyed the depth of explanation from both Enis and J-D. I was indeed
> mistakenly referring to HFile as HLog, fortunately you were still able
> understand my question.
>
> Thanks,
> Pankaj
> On Mar 21, 2013, at 1:28 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:
>
> > I think the page cache is not totally useless, but as long as you can
> > control the GC, you should prefer the block cache. Some of the reasons of
> > the top of my head:
> > - In case of a cache hit, for OS cache, you have to go through the DN
> > layer (an RPC if ssr disabled), and do a kernel jump, and read using the
> > read() libc vs  for reading a block from the block cache, only the HBase
> > process is involved. There is no process switch involved and no kernel
> > jumps.
> > - The read access path is optimized per hfile block. FS page boundaries
> > and hfile block boundaries are not aligned at all.
> > - There is very little control to the page cache to cache / not cache
> > based on expected access patterns. For example, we can mark META region
> > blocks, and some column families, and hfile index blocks always cached or
> > cached with high priority. Also, for full table scans, we can explicitly
> > disable block caching to not trash the current working set. With OS page
> > cache, you do not have this control.
> >
> > Enis
> >
> >
> > On Wed, Mar 20, 2013 at 10:30 AM, Jean-Daniel Cryans <
> [EMAIL PROTECTED]>wrote:
> >
> >> First, MSLAB has been enabled by default since 0.92.0 as it was deemed
> >> stable enough. So, unless you are on 0.90, you are already using it.
> >>
> >> Also, I'm not sure why you are referencing the HLog in your first
> >> paragraph in the context of reading from disk, because the HLogs are
> >> rarely read (only on recovery). Maybe you meant HFile?
> >>
> >> In any case, your email covers most arguments except for one:
> >> checksumming. Retrieving a block from HDFS, even when using short
> >> circuit reads to go directly to the OS instead of passing through the
> >> DN, will take quite a bit more time than reading directly from the
> >> block cache. This is why even if you disable block caching on a family
> >> that the index and root blocks will still be block cached, as reading
> >> those very hot blocks from disk would take way too long.
> >>
> >> Regarding your main question (how does the OS buffer help?), I don't
> >> have a good answer. It kind of depends on the amount of RAM you have
> >> and what your workload is like. As a data point, I've been successful
> >> running with 24GB of heap (50% dedicated to the block cache) with a
> >> workload consisting mainly of small writes, short scans, and a typical
> >> random read distribution for a website. I can't remember the last time
> >> I saw a full GC and it's been running for more than a year like this.
> >>
> >> Hope this somehow helps,
> >>
> >> J-D
> >>
> >> On Wed, Mar 20, 2013 at 12:34 AM, Pankaj Gupta <[EMAIL PROTECTED]>
> >> wrote:
> >>> Given that HBase has it's own cache (block cache and bloom filters) and
> >> that all the table data is stored in HDFS, I'm wondering if HBase
> benefits
> >> from OS page cache at all. In the set up I'm using HBase Region Servers
> run
> >> on the same boxes as the HDFS data node. In such a scenario if the
> >> underlying HLog files lives on the same machine then having a healthy
> >> memory surplus may mean that the data node can serve underlying file
> from
> >> page cache and thus improving HBase performance. Is this really the
> case?
> >> (I guess page cache should also help in case where HLog file lives on a
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB