Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - 0% Data Cache Hit Rate


Copy link to this message
-
Re: 0% Data Cache Hit Rate
Keith Turner 2013-03-12, 18:03
On Tue, Mar 12, 2013 at 11:46 AM, Slater, David M.
<[EMAIL PROTECTED]> wrote:
> Thanks Keith,
>
> I checked, and the values were all default. (cache block disabled)
>
> Turning them on, however, turned the data cache hit rate down to single digits for all of the data nodes. I'm guessing that the queries I am running, since they need to go through so much data, cannot be cached well, and that the high percentages I was getting before were due to the use metatable data cache (since that is enabled by default).

That sounds correct.  Did you up the cache size?

>
> Since data caching is disabled by default, I assume that there are downsides to using it. Is this primarily memory footprint?

Yeah, primarily memory.   Being able to enable/disable it allows you
to decide which tables you want to use that memory.

>
> Regards,
> David
>
> -----Original Message-----
> From: Keith Turner [mailto:[EMAIL PROTECTED]]
> Sent: Monday, March 11, 2013 3:02 PM
> To: [EMAIL PROTECTED]
> Subject: Re: 0% Data Cache Hit Rate
>
> You may need to set the following property to true for the table.
> This enables caching data for a table.  It defaults to false.
>
> table.cache.block.enable
>
> Also take a look at the following props.  These determine how much memory a tserver uses for caching.
>
> tserver.cache.data.size
> tserver.cache.index.size
>
> The following props enables caching rfile indexes for a table, it defaults to true.
>
> table.cache.index.enable
>
> Keith
>
>
> On Mon, Mar 11, 2013 at 2:48 PM, Slater, David M.
> <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>>
>>
>> I have a four-node setup, and I'm running some intensive query
>> operations that need to go through all of the rows (though only one or
>> two column families). While I don't expect this to be fast by any
>> means, I wanted to make sure that I had a decent baseline before
>> comparing this to more indexed versions of querying. Here is the
>> problem: Two of my nodes have very low data cache hit rates, wand I
>> assume that this would greatly impact the query efficiency. Is this correct?
>>
>>
>>
>> All four of my nodes have a 99% index cache hit rate, but the data
>> cache hit rates are:
>>
>> Node 1: 96%
>>
>> Node 2: 95%
>>
>> Node 3: 67%
>>
>> Node 4: 0%
>>
>> (All four are data nodes; the name node is #1)
>>
>>
>>
>> I'm not seeing any warnings or errors in the logs, and I couldn't find
>> much online about it, so I thought I would check here. Does anyone
>> have a suggestion as for how to fix it? Could this be related to the
>> system swappiness at all? (I currently have swappiness set to 0.)
>>
>>
>>
>> Thanks for the help,
>> David Slater