|
|
Slater, David M. 2013-03-11, 18:48
Hi,
I have a four-node setup, and I'm running some intensive query operations that need to go through all of the rows (though only one or two column families). While I don't expect this to be fast by any means, I wanted to make sure that I had a decent baseline before comparing this to more indexed versions of querying. Here is the problem: Two of my nodes have very low data cache hit rates, wand I assume that this would greatly impact the query efficiency. Is this correct?
All four of my nodes have a 99% index cache hit rate, but the data cache hit rates are: Node 1: 96% Node 2: 95% Node 3: 67% Node 4: 0% (All four are data nodes; the name node is #1)
I'm not seeing any warnings or errors in the logs, and I couldn't find much online about it, so I thought I would check here. Does anyone have a suggestion as for how to fix it? Could this be related to the system swappiness at all? (I currently have swappiness set to 0.)
Thanks for the help, David Slater
-
Re: 0% Data Cache Hit Rate
Keith Turner 2013-03-11, 19:02
You may need to set the following property to true for the table. This enables caching data for a table. It defaults to false.
table.cache.block.enable
Also take a look at the following props. These determine how much memory a tserver uses for caching.
tserver.cache.data.size tserver.cache.index.size
The following props enables caching rfile indexes for a table, it defaults to true.
table.cache.index.enable
Keith On Mon, Mar 11, 2013 at 2:48 PM, Slater, David M. <[EMAIL PROTECTED]> wrote: > Hi, > > > > I have a four-node setup, and I’m running some intensive query operations > that need to go through all of the rows (though only one or two column > families). While I don’t expect this to be fast by any means, I wanted to > make sure that I had a decent baseline before comparing this to more indexed > versions of querying. Here is the problem: Two of my nodes have very low > data cache hit rates, wand I assume that this would greatly impact the query > efficiency. Is this correct? > > > > All four of my nodes have a 99% index cache hit rate, but the data cache hit > rates are: > > Node 1: 96% > > Node 2: 95% > > Node 3: 67% > > Node 4: 0% > > (All four are data nodes; the name node is #1) > > > > I’m not seeing any warnings or errors in the logs, and I couldn’t find much > online about it, so I thought I would check here. Does anyone have a > suggestion as for how to fix it? Could this be related to the system > swappiness at all? (I currently have swappiness set to 0.) > > > > Thanks for the help, > David Slater
-
RE: 0% Data Cache Hit Rate
Slater, David M. 2013-03-12, 15:46
Thanks Keith,
I checked, and the values were all default. (cache block disabled)
Turning them on, however, turned the data cache hit rate down to single digits for all of the data nodes. I'm guessing that the queries I am running, since they need to go through so much data, cannot be cached well, and that the high percentages I was getting before were due to the use metatable data cache (since that is enabled by default).
Since data caching is disabled by default, I assume that there are downsides to using it. Is this primarily memory footprint?
Regards, David
-----Original Message----- From: Keith Turner [mailto:[EMAIL PROTECTED]] Sent: Monday, March 11, 2013 3:02 PM To: [EMAIL PROTECTED] Subject: Re: 0% Data Cache Hit Rate
You may need to set the following property to true for the table. This enables caching data for a table. It defaults to false.
table.cache.block.enable
Also take a look at the following props. These determine how much memory a tserver uses for caching.
tserver.cache.data.size tserver.cache.index.size
The following props enables caching rfile indexes for a table, it defaults to true.
table.cache.index.enable
Keith On Mon, Mar 11, 2013 at 2:48 PM, Slater, David M. <[EMAIL PROTECTED]> wrote: > Hi, > > > > I have a four-node setup, and I'm running some intensive query > operations that need to go through all of the rows (though only one or > two column families). While I don't expect this to be fast by any > means, I wanted to make sure that I had a decent baseline before > comparing this to more indexed versions of querying. Here is the > problem: Two of my nodes have very low data cache hit rates, wand I > assume that this would greatly impact the query efficiency. Is this correct? > > > > All four of my nodes have a 99% index cache hit rate, but the data > cache hit rates are: > > Node 1: 96% > > Node 2: 95% > > Node 3: 67% > > Node 4: 0% > > (All four are data nodes; the name node is #1) > > > > I'm not seeing any warnings or errors in the logs, and I couldn't find > much online about it, so I thought I would check here. Does anyone > have a suggestion as for how to fix it? Could this be related to the > system swappiness at all? (I currently have swappiness set to 0.) > > > > Thanks for the help, > David Slater
-
Re: 0% Data Cache Hit Rate
Keith Turner 2013-03-12, 18:03
On Tue, Mar 12, 2013 at 11:46 AM, Slater, David M. <[EMAIL PROTECTED]> wrote: > Thanks Keith, > > I checked, and the values were all default. (cache block disabled) > > Turning them on, however, turned the data cache hit rate down to single digits for all of the data nodes. I'm guessing that the queries I am running, since they need to go through so much data, cannot be cached well, and that the high percentages I was getting before were due to the use metatable data cache (since that is enabled by default).
That sounds correct. Did you up the cache size?
> > Since data caching is disabled by default, I assume that there are downsides to using it. Is this primarily memory footprint?
Yeah, primarily memory. Being able to enable/disable it allows you to decide which tables you want to use that memory.
> > Regards, > David > > -----Original Message----- > From: Keith Turner [mailto:[EMAIL PROTECTED]] > Sent: Monday, March 11, 2013 3:02 PM > To: [EMAIL PROTECTED] > Subject: Re: 0% Data Cache Hit Rate > > You may need to set the following property to true for the table. > This enables caching data for a table. It defaults to false. > > table.cache.block.enable > > Also take a look at the following props. These determine how much memory a tserver uses for caching. > > tserver.cache.data.size > tserver.cache.index.size > > The following props enables caching rfile indexes for a table, it defaults to true. > > table.cache.index.enable > > Keith > > > On Mon, Mar 11, 2013 at 2:48 PM, Slater, David M. > <[EMAIL PROTECTED]> wrote: >> Hi, >> >> >> >> I have a four-node setup, and I'm running some intensive query >> operations that need to go through all of the rows (though only one or >> two column families). While I don't expect this to be fast by any >> means, I wanted to make sure that I had a decent baseline before >> comparing this to more indexed versions of querying. Here is the >> problem: Two of my nodes have very low data cache hit rates, wand I >> assume that this would greatly impact the query efficiency. Is this correct? >> >> >> >> All four of my nodes have a 99% index cache hit rate, but the data >> cache hit rates are: >> >> Node 1: 96% >> >> Node 2: 95% >> >> Node 3: 67% >> >> Node 4: 0% >> >> (All four are data nodes; the name node is #1) >> >> >> >> I'm not seeing any warnings or errors in the logs, and I couldn't find >> much online about it, so I thought I would check here. Does anyone >> have a suggestion as for how to fix it? Could this be related to the >> system swappiness at all? (I currently have swappiness set to 0.) >> >> >> >> Thanks for the help, >> David Slater
-
RE: 0% Data Cache Hit Rate
Slater, David M. 2013-03-12, 20:24
I didn't up the cache size. Are the defaults tuned to running just the metatable data cache?
To "decide which tables you want to use that memory", is there an additional property to set? tserver.cache.data.size you can set on a per-tserver basis, but how do you do it on a per table basis?
-----Original Message----- From: Keith Turner [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 12, 2013 2:04 PM To: [EMAIL PROTECTED] Subject: Re: 0% Data Cache Hit Rate
On Tue, Mar 12, 2013 at 11:46 AM, Slater, David M. <[EMAIL PROTECTED]> wrote: > Thanks Keith, > > I checked, and the values were all default. (cache block disabled) > > Turning them on, however, turned the data cache hit rate down to single digits for all of the data nodes. I'm guessing that the queries I am running, since they need to go through so much data, cannot be cached well, and that the high percentages I was getting before were due to the use metatable data cache (since that is enabled by default).
That sounds correct. Did you up the cache size?
> > Since data caching is disabled by default, I assume that there are downsides to using it. Is this primarily memory footprint?
Yeah, primarily memory. Being able to enable/disable it allows you to decide which tables you want to use that memory.
> > Regards, > David > > -----Original Message----- > From: Keith Turner [mailto:[EMAIL PROTECTED]] > Sent: Monday, March 11, 2013 3:02 PM > To: [EMAIL PROTECTED] > Subject: Re: 0% Data Cache Hit Rate > > You may need to set the following property to true for the table. > This enables caching data for a table. It defaults to false. > > table.cache.block.enable > > Also take a look at the following props. These determine how much memory a tserver uses for caching. > > tserver.cache.data.size > tserver.cache.index.size > > The following props enables caching rfile indexes for a table, it defaults to true. > > table.cache.index.enable > > Keith > > > On Mon, Mar 11, 2013 at 2:48 PM, Slater, David M. > <[EMAIL PROTECTED]> wrote: >> Hi, >> >> >> >> I have a four-node setup, and I'm running some intensive query >> operations that need to go through all of the rows (though only one >> or two column families). While I don't expect this to be fast by any >> means, I wanted to make sure that I had a decent baseline before >> comparing this to more indexed versions of querying. Here is the >> problem: Two of my nodes have very low data cache hit rates, wand I >> assume that this would greatly impact the query efficiency. Is this correct? >> >> >> >> All four of my nodes have a 99% index cache hit rate, but the data >> cache hit rates are: >> >> Node 1: 96% >> >> Node 2: 95% >> >> Node 3: 67% >> >> Node 4: 0% >> >> (All four are data nodes; the name node is #1) >> >> >> >> I'm not seeing any warnings or errors in the logs, and I couldn't >> find much online about it, so I thought I would check here. Does >> anyone have a suggestion as for how to fix it? Could this be related >> to the system swappiness at all? (I currently have swappiness set to >> 0.) >> >> >> >> Thanks for the help, >> David Slater
-
Re: 0% Data Cache Hit Rate
Keith Turner 2013-03-12, 20:41
On Tue, Mar 12, 2013 at 4:24 PM, Slater, David M. <[EMAIL PROTECTED]> wrote: > I didn't up the cache size. Are the defaults tuned to running just the metatable data cache?
Yeah they are basically good for the metadata table out of the box. As you turn on the cache for other tables you may want to consider upping it.
> > To "decide which tables you want to use that memory", is there an additional property to set?
Setting table.cache.block.enable to true will turn data caching on for a table. For example doing the following in the shell would turn on data caching for table foo
config -t foo -s table.cache.block.enable=true
> tserver.cache.data.size you can set on a per-tserver basis, but how do you do it on a per table basis?
The property sets the total cache memory available on a tserver. I would recommend upping it for your experiments, like the following.
config -s tserver.cache.data.size=2G
You may need to up the java -Xmx setting. I can not remember if you will need to restart Accumulo for these setting to take place.
> > -----Original Message----- > From: Keith Turner [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, March 12, 2013 2:04 PM > To: [EMAIL PROTECTED] > Subject: Re: 0% Data Cache Hit Rate > > On Tue, Mar 12, 2013 at 11:46 AM, Slater, David M. > <[EMAIL PROTECTED]> wrote: >> Thanks Keith, >> >> I checked, and the values were all default. (cache block disabled) >> >> Turning them on, however, turned the data cache hit rate down to single digits for all of the data nodes. I'm guessing that the queries I am running, since they need to go through so much data, cannot be cached well, and that the high percentages I was getting before were due to the use metatable data cache (since that is enabled by default). > > That sounds correct. Did you up the cache size? > >> >> Since data caching is disabled by default, I assume that there are downsides to using it. Is this primarily memory footprint? > > Yeah, primarily memory. Being able to enable/disable it allows you > to decide which tables you want to use that memory. > >> >> Regards, >> David >> >> -----Original Message----- >> From: Keith Turner [mailto:[EMAIL PROTECTED]] >> Sent: Monday, March 11, 2013 3:02 PM >> To: [EMAIL PROTECTED] >> Subject: Re: 0% Data Cache Hit Rate >> >> You may need to set the following property to true for the table. >> This enables caching data for a table. It defaults to false. >> >> table.cache.block.enable >> >> Also take a look at the following props. These determine how much memory a tserver uses for caching. >> >> tserver.cache.data.size >> tserver.cache.index.size >> >> The following props enables caching rfile indexes for a table, it defaults to true. >> >> table.cache.index.enable >> >> Keith >> >> >> On Mon, Mar 11, 2013 at 2:48 PM, Slater, David M. >> <[EMAIL PROTECTED]> wrote: >>> Hi, >>> >>> >>> >>> I have a four-node setup, and I'm running some intensive query >>> operations that need to go through all of the rows (though only one >>> or two column families). While I don't expect this to be fast by any >>> means, I wanted to make sure that I had a decent baseline before >>> comparing this to more indexed versions of querying. Here is the >>> problem: Two of my nodes have very low data cache hit rates, wand I >>> assume that this would greatly impact the query efficiency. Is this correct? >>> >>> >>> >>> All four of my nodes have a 99% index cache hit rate, but the data >>> cache hit rates are: >>> >>> Node 1: 96% >>> >>> Node 2: 95% >>> >>> Node 3: 67% >>> >>> Node 4: 0% >>> >>> (All four are data nodes; the name node is #1) >>> >>> >>> >>> I'm not seeing any warnings or errors in the logs, and I couldn't >>> find much online about it, so I thought I would check here. Does >>> anyone have a suggestion as for how to fix it? Could this be related >>> to the system swappiness at all? (I currently have swappiness set to >>> 0.) >>> >>> >>> >>> Thanks for the help, >>> David Slater
|
|