We are running a number of Map/Reduce jobs on top of HBase. We are not using HBase for any of its realtime capabilities, only for batch-processing. So we aren't doing lookups, just scans.
Each one of our jobs has *scan.setCaching(false)* to turn off block-caching, since each block will only be accessed once.
We recently started using Cloudera Manager, and I’m seeing something that doesn’t add up. See image below. It’s clear from the graphs that Block Cache is being used currently, and blocks are being cached and evicted.
We do have *hfile.block.cache.size* set to 0.4 (default), but my understanding is that the jobs setting scan.setCaching(false) should override this. Since it’s set in every job, there should be no blocks being cached.
The Block Cache is used for more than just the scanner caching. Additionally, *hfile.block.cache.size *is a server-side config, while scan.setCaching(false) is on an RPC-level. So regardless of your setCaching value the RegionServers will continue to allocate memory to the block cache.
Bryan - I believe you're right, but wanted to confirm.
Thanks, -Matt On Mon, Jun 2, 2014 at 4:09 PM, Ted Yu <[EMAIL PROTECTED]> wrote: www.calcmachine.com - easy online calculator.
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext