We are running a number of Map/Reduce jobs on top of HBase. We are not using HBase for any of its realtime capabilities, only for batch-processing. So we aren't doing lookups, just scans.
Each one of our jobs has *scan.setCaching(false)* to turn off block-caching, since each block will only be accessed once.
We recently started using Cloudera Manager, and I’m seeing something that doesn’t add up. See image below. It’s clear from the graphs that Block Cache is being used currently, and blocks are being cached and evicted.
We do have *hfile.block.cache.size* set to 0.4 (default), but my understanding is that the jobs setting scan.setCaching(false) should override this. Since it’s set in every job, there should be no blocks being cached.
Can anyone help me understand what we’re seeing?
[image: Inline image 1]
Re: hbase block-cache scan.setCaching(false) not being respected
The Block Cache is used for more than just the scanner caching. Additionally, *hfile.block.cache.size *is a server-side config, while scan.setCaching(false) is on an RPC-level. So regardless of your setCaching value the RegionServers will continue to allocate memory to the block cache.