|
Jack Levin
2011-05-15, 02:12
Jack Levin
2011-05-16, 00:37
Ted Yu
2011-05-16, 02:09
Jack Levin
2011-05-16, 02:49
Jean-Daniel Cryans
2011-05-16, 18:53
Stack
2011-05-16, 19:02
Jack Levin
2011-05-16, 19:39
Stack
2011-05-16, 19:44
Jack Levin
2011-05-16, 20:36
Stack
2011-05-16, 20:48
Jack Levin
2011-05-16, 22:17
Jack Levin
2011-05-16, 23:11
Stack
2011-05-16, 23:19
Jack Levin
2011-05-17, 00:06
Andrew Purtell
2011-05-17, 02:41
Jack Levin
2011-05-17, 03:30
Stack
2011-05-17, 03:52
Jack Levin
2011-05-17, 04:55
Matt Corgan
2011-05-17, 14:27
Todd Lipcon
2011-05-17, 22:39
Jack Levin
2011-05-18, 00:09
Jack Levin
2011-05-19, 22:46
|
-
GC and High CPUJack Levin 2011-05-15, 02:12
Greetings all -- We had some issues with huge GC counts that drive CPU
to 100% and rendering RS basically dead, here is some data: We are normal here, about 1 ParNew per second. 67 2011-05-14T15:35 75 2011-05-14T15:36 70 2011-05-14T15:37 74 2011-05-14T15:38 73 2011-05-14T15:39 73 2011-05-14T15:40 69 2011-05-14T15:41 68 2011-05-14T15:42 76 2011-05-14T15:43 70 2011-05-14T15:44 68 2011-05-14T15:45 322 2011-05-14T15:46 562 2011-05-14T15:47 <------------ This is where its about 10 per second. 567 2011-05-14T15:48 563 2011-05-14T15:49 562 2011-05-14T15:50 561 2011-05-14T15:51 513 2011-05-14T15:52 here is the jstack from the time of this: http://pastebin.com/RYdwH8Gp Unusually high BLOCKED threads. This is config for hbase-env: export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log" Does any have any suggestions on how to manage this? -Jack
-
Re: GC and High CPUJack Levin 2011-05-16, 00:37
I've added occupancy: export HBASE_OPTS="$HBASE_OPTS -verbose:gc
-XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log" But we had same issue. Whats strange is that the load on the RS would be around 1000 rpc calls, then we have hlog roll, and GC goes crazy, and kills all of the user CPU. I have another jstack: http://pastebin.com/4n3MvmrX from a different server. How do people deal with this? We have HEAP of 12G, block cace at 5G, and upper memstore at 1G. Is hlog roll that intensive? I though 1000 rpc calls per RS should be easy? The server is not at all busy pre-spike, you can tell by attached graph. -Jack On Sat, May 14, 2011 at 7:12 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > Greetings all -- We had some issues with huge GC counts that drive CPU > to 100% and rendering RS basically dead, here is some data: > > We are normal here, about 1 ParNew per second. > > 67 2011-05-14T15:35 > 75 2011-05-14T15:36 > 70 2011-05-14T15:37 > 74 2011-05-14T15:38 > 73 2011-05-14T15:39 > 73 2011-05-14T15:40 > 69 2011-05-14T15:41 > 68 2011-05-14T15:42 > 76 2011-05-14T15:43 > 70 2011-05-14T15:44 > 68 2011-05-14T15:45 > 322 2011-05-14T15:46 > 562 2011-05-14T15:47 <------------ This is where its about 10 per second. > 567 2011-05-14T15:48 > 563 2011-05-14T15:49 > 562 2011-05-14T15:50 > 561 2011-05-14T15:51 > 513 2011-05-14T15:52 > > here is the jstack from the time of this: > > http://pastebin.com/RYdwH8Gp Unusually high BLOCKED threads. > > This is config for hbase-env: > > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError > -Xloggc:$HBASE_HOME/logs/gc-hbase.log" > > Does any have any suggestions on how to manage this? > > -Jack >
-
Re: GC and High CPUTed Yu 2011-05-16, 02:09
Jack:
Your attachment got filtered. Can you upload somewhere ? Thanks On Sun, May 15, 2011 at 5:37 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > I've added occupancy: export HBASE_OPTS="$HBASE_OPTS -verbose:gc > -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError > -Xloggc:$HBASE_HOME/logs/gc-hbase.log" > > But we had same issue. Whats strange is that the load on the RS would > be around 1000 rpc calls, then we have hlog roll, and GC goes crazy, > and kills all of the user CPU. I have another jstack: > http://pastebin.com/4n3MvmrX from a different server. How do people > deal with this? We have HEAP of 12G, block cace at 5G, and upper > memstore at 1G. Is hlog roll that intensive? I though 1000 rpc calls > per RS should be easy? The server is not at all busy pre-spike, you > can tell by attached graph. > > -Jack > > On Sat, May 14, 2011 at 7:12 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > > Greetings all -- We had some issues with huge GC counts that drive CPU > > to 100% and rendering RS basically dead, here is some data: > > > > We are normal here, about 1 ParNew per second. > > > > 67 2011-05-14T15:35 > > 75 2011-05-14T15:36 > > 70 2011-05-14T15:37 > > 74 2011-05-14T15:38 > > 73 2011-05-14T15:39 > > 73 2011-05-14T15:40 > > 69 2011-05-14T15:41 > > 68 2011-05-14T15:42 > > 76 2011-05-14T15:43 > > 70 2011-05-14T15:44 > > 68 2011-05-14T15:45 > > 322 2011-05-14T15:46 > > 562 2011-05-14T15:47 <------------ This is where its about 10 per > second. > > 567 2011-05-14T15:48 > > 563 2011-05-14T15:49 > > 562 2011-05-14T15:50 > > 561 2011-05-14T15:51 > > 513 2011-05-14T15:52 > > > > here is the jstack from the time of this: > > > > http://pastebin.com/RYdwH8Gp Unusually high BLOCKED threads. > > > > This is config for hbase-env: > > > > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails > > -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError > > -Xloggc:$HBASE_HOME/logs/gc-hbase.log" > > > > Does any have any suggestions on how to manage this? > > > > -Jack > > >
-
Re: GC and High CPUJack Levin 2011-05-16, 02:49
Somewhere like imageshack :)
http://img221.imageshack.us/img221/761/screenshot20110515at537p.png -Jack On Sun, May 15, 2011 at 7:09 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Jack: > Your attachment got filtered. > Can you upload somewhere ? > > Thanks > > On Sun, May 15, 2011 at 5:37 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> I've added occupancy: export HBASE_OPTS="$HBASE_OPTS -verbose:gc >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log" >> >> But we had same issue. Whats strange is that the load on the RS would >> be around 1000 rpc calls, then we have hlog roll, and GC goes crazy, >> and kills all of the user CPU. I have another jstack: >> http://pastebin.com/4n3MvmrX from a different server. How do people >> deal with this? We have HEAP of 12G, block cace at 5G, and upper >> memstore at 1G. Is hlog roll that intensive? I though 1000 rpc calls >> per RS should be easy? The server is not at all busy pre-spike, you >> can tell by attached graph. >> >> -Jack >> >> On Sat, May 14, 2011 at 7:12 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> > Greetings all -- We had some issues with huge GC counts that drive CPU >> > to 100% and rendering RS basically dead, here is some data: >> > >> > We are normal here, about 1 ParNew per second. >> > >> > 67 2011-05-14T15:35 >> > 75 2011-05-14T15:36 >> > 70 2011-05-14T15:37 >> > 74 2011-05-14T15:38 >> > 73 2011-05-14T15:39 >> > 73 2011-05-14T15:40 >> > 69 2011-05-14T15:41 >> > 68 2011-05-14T15:42 >> > 76 2011-05-14T15:43 >> > 70 2011-05-14T15:44 >> > 68 2011-05-14T15:45 >> > 322 2011-05-14T15:46 >> > 562 2011-05-14T15:47 <------------ This is where its about 10 per >> second. >> > 567 2011-05-14T15:48 >> > 563 2011-05-14T15:49 >> > 562 2011-05-14T15:50 >> > 561 2011-05-14T15:51 >> > 513 2011-05-14T15:52 >> > >> > here is the jstack from the time of this: >> > >> > http://pastebin.com/RYdwH8Gp Unusually high BLOCKED threads. >> > >> > This is config for hbase-env: >> > >> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails >> > -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError >> > -Xloggc:$HBASE_HOME/logs/gc-hbase.log" >> > >> > Does any have any suggestions on how to manage this? >> > >> > -Jack >> > >> >
-
Re: GC and High CPUJean-Daniel Cryans 2011-05-16, 18:53
If you have a high insert rate then maybe log rolling (which blocks
inserts a little) makes it that the calls get queued enough (occupying heap) to make you enter a GC loop of death? Can you enable RPC logging and see if you can confirm that? Thx, J-D On Sun, May 15, 2011 at 5:37 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > I've added occupancy: export HBASE_OPTS="$HBASE_OPTS -verbose:gc > -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError > -Xloggc:$HBASE_HOME/logs/gc-hbase.log" > > But we had same issue. Whats strange is that the load on the RS would > be around 1000 rpc calls, then we have hlog roll, and GC goes crazy, > and kills all of the user CPU. I have another jstack: > http://pastebin.com/4n3MvmrX from a different server. How do people > deal with this? We have HEAP of 12G, block cace at 5G, and upper > memstore at 1G. Is hlog roll that intensive? I though 1000 rpc calls > per RS should be easy? The server is not at all busy pre-spike, you > can tell by attached graph. > > -Jack > > On Sat, May 14, 2011 at 7:12 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> Greetings all -- We had some issues with huge GC counts that drive CPU >> to 100% and rendering RS basically dead, here is some data: >> >> We are normal here, about 1 ParNew per second. >> >> 67 2011-05-14T15:35 >> 75 2011-05-14T15:36 >> 70 2011-05-14T15:37 >> 74 2011-05-14T15:38 >> 73 2011-05-14T15:39 >> 73 2011-05-14T15:40 >> 69 2011-05-14T15:41 >> 68 2011-05-14T15:42 >> 76 2011-05-14T15:43 >> 70 2011-05-14T15:44 >> 68 2011-05-14T15:45 >> 322 2011-05-14T15:46 >> 562 2011-05-14T15:47 <------------ This is where its about 10 per second. >> 567 2011-05-14T15:48 >> 563 2011-05-14T15:49 >> 562 2011-05-14T15:50 >> 561 2011-05-14T15:51 >> 513 2011-05-14T15:52 >> >> here is the jstack from the time of this: >> >> http://pastebin.com/RYdwH8Gp Unusually high BLOCKED threads. >> >> This is config for hbase-env: >> >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log" >> >> Does any have any suggestions on how to manage this? >> >> -Jack >> >
-
Re: GC and High CPUStack 2011-05-16, 19:02
On Sun, May 15, 2011 at 5:37 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
> I've added occupancy: export HBASE_OPTS="$HBASE_OPTS -verbose:gc > -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError > -Xloggc:$HBASE_HOME/logs/gc-hbase.log" > Does the big CPU usage correlate to a Full GC? Do you notice that in your logs Jack? I took a look at your thread dumps. Nothing untoward (blocking on hdfs access). St.Ack
-
Re: GC and High CPUJack Levin 2011-05-16, 19:39
There HEAP would be 8G used out of 12G total. The gc-log would be
full of ParNew, no FULL GC at all, ParNew just starts at hight rate (10-15 lines per second). Even if you remove queries, the JVM will not recover, and user CPU will remain 100%. Only thing left to do is to kill the process. We are not writing in nearly as fast to case such an issue. This looks like a systemic problem, e.g. low enough load when it comes to QPS, and the problem effectively kills HBASE. There should be a way to tune things so that ParNew does not jump to an inappropriate levels. -Jack On Mon, May 16, 2011 at 12:02 PM, Stack <[EMAIL PROTECTED]> wrote: > On Sun, May 15, 2011 at 5:37 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> I've added occupancy: export HBASE_OPTS="$HBASE_OPTS -verbose:gc >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log" >> > > Does the big CPU usage correlate to a Full GC? Do you notice that in > your logs Jack? > > I took a look at your thread dumps. Nothing untoward (blocking on hdfs access). > > St.Ack >
-
Re: GC and High CPUStack 2011-05-16, 19:44
What is the size of your new gen? Is it growing? Does it level off?
St.Ack On Mon, May 16, 2011 at 12:39 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > There HEAP would be 8G used out of 12G total. The gc-log would be > full of ParNew, no FULL GC at all, ParNew just starts at hight rate > (10-15 lines per second). Even if you remove queries, the JVM will > not recover, and user CPU will remain 100%. Only thing left to do is > to kill the process. We are not writing in nearly as fast to case > such an issue. This looks like a systemic problem, e.g. low enough > load when it comes to QPS, and the problem effectively kills HBASE. > There should be a way to tune things so that ParNew does not jump to > an inappropriate levels. > > -Jack > > On Mon, May 16, 2011 at 12:02 PM, Stack <[EMAIL PROTECTED]> wrote: >> On Sun, May 15, 2011 at 5:37 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >>> I've added occupancy: export HBASE_OPTS="$HBASE_OPTS -verbose:gc >>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails >>> -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError >>> -Xloggc:$HBASE_HOME/logs/gc-hbase.log" >>> >> >> Does the big CPU usage correlate to a Full GC? Do you notice that in >> your logs Jack? >> >> I took a look at your thread dumps. Nothing untoward (blocking on hdfs access). >> >> St.Ack >> >
-
Re: GC and High CPUJack Levin 2011-05-16, 20:36
How do you tell? This is the log entries when we had 100% cpu:
2011-05-14T15:48:58.240-0700: 5128.407: [GC 5128.407: [ParNew: 17723K->780K(19136K), 0.0199350 secs] 4309804K->4292973K(5777060K), 0.0200660 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 2011-05-14T15:48:58.349-0700: 5128.515: [GC 5128.515: [ParNew: 17804K->886K(19136K), 0.0188930 secs] 4309997K->4293119K(5777060K), 0.0189990 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 2011-05-14T15:48:58.456-0700: 5128.623: [GC 5128.623: [ParNew: 17910K->900K(19136K), 0.0144800 secs] 4310143K->4293237K(5777060K), 0.0145890 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 2011-05-14T15:48:58.563-0700: 5128.729: [GC 5128.729: [ParNew: 17924K->945K(19136K), 0.0144940 secs] 4310261K->4293469K(5777060K), 0.0146210 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 2011-05-14T15:48:58.668-0700: 5128.835: [GC 5128.835: [ParNew: 17969K->731K(19136K), 0.0149060 secs] 4310493K->4293412K(5777060K), 0.0150210 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 2011-05-14T15:48:58.773-0700: 5128.940: [GC 5128.940: [ParNew: 17755K->906K(19136K), 0.0248650 secs] 4310436K->4293724K(5777060K), 0.0249820 secs] [Times: user=0.07 sys=0.00, real=0.03 secs] 2011-05-14T15:48:58.891-0700: 5129.057: [GC 5129.057: [ParNew: 17930K->1031K(19136K), 0.0194370 secs] 4310748K->4293961K(5777060K), 0.0196490 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 2011-05-14T15:48:59.004-0700: 5129.171: [GC 5129.171: [ParNew: 18055K->960K(19136K), 0.0145450 secs] 4310985K->4293993K(5777060K), 0.0146530 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 2011-05-14T15:48:59.115-0700: 5129.281: [GC 5129.282: [ParNew: 17984K->1098K(19136K), 0.0143740 secs] 4311017K->4294296K(5777060K), 0.0144780 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 2011-05-14T15:48:59.224-0700: 5129.391: [GC 5129.391: [ParNew: 18122K->867K(19136K), 0.0150360 secs] 4311320K->4294248K(5777060K), 0.0151500 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 2011-05-14T15:48:59.331-0700: 5129.498: [GC 5129.498: [ParNew: 17891K->1011K(19136K), 0.0220280 secs] 4311272K->4294544K(5777060K), 0.0221320 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 2011-05-14T15:48:59.441-0700: 5129.608: [GC 5129.608: [ParNew: 18035K->1064K(19136K), 0.0164780 secs] 4311568K->4294803K(5777060K), 0.0165820 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 2011-05-14T15:48:59.548-0700: 5129.714: [GC 5129.715: [ParNew: 18088K->1193K(19136K), 0.0141640 secs] 4311827K->4295008K(5777060K), 0.0142730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 2011-05-14T15:48:59.653-0700: 5129.820: [GC 5129.820: [ParNew: 18217K->1078K(19136K), 0.0150800 secs] 4312032K->4295035K(5777060K), 0.0151780 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 2011-05-14T15:48:59.754-0700: 5129.921: [GC 5129.921: [ParNew: 18102K->906K(19136K), 0.0149160 secs] 4312059K->4295145K(5777060K), 0.0150130 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 2011-05-14T15:48:59.858-0700: 5130.025: [GC 5130.025: [ParNew: 17930K->749K(19136K), 0.0177820 secs] 4312169K->4295108K(5777060K), 0.0178890 secs] [Times: user=0.07 sys=0.00, real=0.01 secs] 2011-05-14T15:48:59.962-0700: 5130.129: [GC 5130.129: [ParNew: 17773K->970K(19136K), 0.0145400 secs] 4312132K->4295460K(5777060K), 0.0146530 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] -Jack On Mon, May 16, 2011 at 12:44 PM, Stack <[EMAIL PROTECTED]> wrote: > What is the size of your new gen? Is it growing? Does it level off? > St.Ack > > On Mon, May 16, 2011 at 12:39 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> There HEAP would be 8G used out of 12G total. The gc-log would be >> full of ParNew, no FULL GC at all, ParNew just starts at hight rate >> (10-15 lines per second). Even if you remove queries, the JVM will >> not recover, and user CPU will remain 100%. Only thing left to do is >> to kill the process. We are not writing in nearly as fast to case >> such an issue. This looks like a systemic problem, e.g. low enough >> load when it comes to QPS, and the problem effectively kills HBASE.
-
Re: GC and High CPUStack 2011-05-16, 20:48
On Mon, May 16, 2011 at 1:36 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
> How do you tell? This is the log entries when we had 100% cpu: > > 2011-05-14T15:48:58.240-0700: 5128.407: [GC 5128.407: [ParNew: > 17723K->780K(19136K), 0.0199350 secs] 4309804K->4292973K(5777060K), > 0.0200660 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] This doesn't look bad. Its going from 17.8MB to 0.8MB in 0.02 of real time. Looks like you new gen. is 19M at the moment which seems fine. You have set -Xmx == -Xms in your head because it would seem to say you are using 4.3G only. Looking at the rest of your log, we seem to be doing fine too; we're just cleaning out loads of garbage and we're doing it in young gen -- which is usually what you want -- and we're doing it pretty quick. Whats going on on your servers at the time? You said you took off the request load but maybe there is a bunch of work queued or we're running lots of compactions at the time? St.Ack > 2011-05-14T15:48:58.349-0700: 5128.515: [GC 5128.515: [ParNew: > 17804K->886K(19136K), 0.0188930 secs] 4309997K->4293119K(5777060K), > 0.0189990 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:58.456-0700: 5128.623: [GC 5128.623: [ParNew: > 17910K->900K(19136K), 0.0144800 secs] 4310143K->4293237K(5777060K), > 0.0145890 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:58.563-0700: 5128.729: [GC 5128.729: [ParNew: > 17924K->945K(19136K), 0.0144940 secs] 4310261K->4293469K(5777060K), > 0.0146210 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:58.668-0700: 5128.835: [GC 5128.835: [ParNew: > 17969K->731K(19136K), 0.0149060 secs] 4310493K->4293412K(5777060K), > 0.0150210 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] > 2011-05-14T15:48:58.773-0700: 5128.940: [GC 5128.940: [ParNew: > 17755K->906K(19136K), 0.0248650 secs] 4310436K->4293724K(5777060K), > 0.0249820 secs] [Times: user=0.07 sys=0.00, real=0.03 secs] > 2011-05-14T15:48:58.891-0700: 5129.057: [GC 5129.057: [ParNew: > 17930K->1031K(19136K), 0.0194370 secs] 4310748K->4293961K(5777060K), > 0.0196490 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:59.004-0700: 5129.171: [GC 5129.171: [ParNew: > 18055K->960K(19136K), 0.0145450 secs] 4310985K->4293993K(5777060K), > 0.0146530 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:59.115-0700: 5129.281: [GC 5129.282: [ParNew: > 17984K->1098K(19136K), 0.0143740 secs] 4311017K->4294296K(5777060K), > 0.0144780 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:59.224-0700: 5129.391: [GC 5129.391: [ParNew: > 18122K->867K(19136K), 0.0150360 secs] 4311320K->4294248K(5777060K), > 0.0151500 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:59.331-0700: 5129.498: [GC 5129.498: [ParNew: > 17891K->1011K(19136K), 0.0220280 secs] 4311272K->4294544K(5777060K), > 0.0221320 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:59.441-0700: 5129.608: [GC 5129.608: [ParNew: > 18035K->1064K(19136K), 0.0164780 secs] 4311568K->4294803K(5777060K), > 0.0165820 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:59.548-0700: 5129.714: [GC 5129.715: [ParNew: > 18088K->1193K(19136K), 0.0141640 secs] 4311827K->4295008K(5777060K), > 0.0142730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] > 2011-05-14T15:48:59.653-0700: 5129.820: [GC 5129.820: [ParNew: > 18217K->1078K(19136K), 0.0150800 secs] 4312032K->4295035K(5777060K), > 0.0151780 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:59.754-0700: 5129.921: [GC 5129.921: [ParNew: > 18102K->906K(19136K), 0.0149160 secs] 4312059K->4295145K(5777060K), > 0.0150130 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] > 2011-05-14T15:48:59.858-0700: 5130.025: [GC 5130.025: [ParNew: > 17930K->749K(19136K), 0.0177820 secs] 4312169K->4295108K(5777060K), > 0.0178890 secs] [Times: user=0.07 sys=0.00, real=0.01 secs] > 2011-05-14T15:48:59.962-0700: 5130.129: [GC 5130.129: [ParNew: > 17773K->970K(19136K), 0.0145400 secs] 4312132K->4295460K(5777060K),
-
Re: GC and High CPUJack Levin 2011-05-16, 22:17
hlog rolled, and qps was at 1000, not much at all. I tend to believe
that concurrent GC collection is taking way too many threads from the app layer and causing CPU to run up. Setting up incremental mode should help, I will do so, and report what I find. -Jack On Mon, May 16, 2011 at 1:48 PM, Stack <[EMAIL PROTECTED]> wrote: > On Mon, May 16, 2011 at 1:36 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> How do you tell? This is the log entries when we had 100% cpu: >> >> 2011-05-14T15:48:58.240-0700: 5128.407: [GC 5128.407: [ParNew: >> 17723K->780K(19136K), 0.0199350 secs] 4309804K->4292973K(5777060K), >> 0.0200660 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] > > This doesn't look bad. Its going from 17.8MB to 0.8MB in 0.02 of real > time. Looks like you new gen. is 19M at the moment which seems fine. > You have set -Xmx == -Xms in your head because it would seem to say > you are using 4.3G only. > > Looking at the rest of your log, we seem to be doing fine too; we're > just cleaning out loads of garbage and we're doing it in young gen -- > which is usually what you want -- and we're doing it pretty quick. > > Whats going on on your servers at the time? You said you took off the > request load but maybe there is a bunch of work queued or we're > running lots of compactions at the time? > > > St.Ack > > >> 2011-05-14T15:48:58.349-0700: 5128.515: [GC 5128.515: [ParNew: >> 17804K->886K(19136K), 0.0188930 secs] 4309997K->4293119K(5777060K), >> 0.0189990 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:58.456-0700: 5128.623: [GC 5128.623: [ParNew: >> 17910K->900K(19136K), 0.0144800 secs] 4310143K->4293237K(5777060K), >> 0.0145890 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:58.563-0700: 5128.729: [GC 5128.729: [ParNew: >> 17924K->945K(19136K), 0.0144940 secs] 4310261K->4293469K(5777060K), >> 0.0146210 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:58.668-0700: 5128.835: [GC 5128.835: [ParNew: >> 17969K->731K(19136K), 0.0149060 secs] 4310493K->4293412K(5777060K), >> 0.0150210 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] >> 2011-05-14T15:48:58.773-0700: 5128.940: [GC 5128.940: [ParNew: >> 17755K->906K(19136K), 0.0248650 secs] 4310436K->4293724K(5777060K), >> 0.0249820 secs] [Times: user=0.07 sys=0.00, real=0.03 secs] >> 2011-05-14T15:48:58.891-0700: 5129.057: [GC 5129.057: [ParNew: >> 17930K->1031K(19136K), 0.0194370 secs] 4310748K->4293961K(5777060K), >> 0.0196490 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:59.004-0700: 5129.171: [GC 5129.171: [ParNew: >> 18055K->960K(19136K), 0.0145450 secs] 4310985K->4293993K(5777060K), >> 0.0146530 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:59.115-0700: 5129.281: [GC 5129.282: [ParNew: >> 17984K->1098K(19136K), 0.0143740 secs] 4311017K->4294296K(5777060K), >> 0.0144780 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:59.224-0700: 5129.391: [GC 5129.391: [ParNew: >> 18122K->867K(19136K), 0.0150360 secs] 4311320K->4294248K(5777060K), >> 0.0151500 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:59.331-0700: 5129.498: [GC 5129.498: [ParNew: >> 17891K->1011K(19136K), 0.0220280 secs] 4311272K->4294544K(5777060K), >> 0.0221320 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:59.441-0700: 5129.608: [GC 5129.608: [ParNew: >> 18035K->1064K(19136K), 0.0164780 secs] 4311568K->4294803K(5777060K), >> 0.0165820 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:59.548-0700: 5129.714: [GC 5129.715: [ParNew: >> 18088K->1193K(19136K), 0.0141640 secs] 4311827K->4295008K(5777060K), >> 0.0142730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] >> 2011-05-14T15:48:59.653-0700: 5129.820: [GC 5129.820: [ParNew: >> 18217K->1078K(19136K), 0.0150800 secs] 4312032K->4295035K(5777060K), >> 0.0151780 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] >> 2011-05-14T15:48:59.754-0700: 5129.921: [GC 5129.921: [ParNew:
-
Re: GC and High CPUJack Levin 2011-05-16, 23:11
I think this will resolve my issue, here is the output:
14 2011-05-16T15:58 13 2011-05-16T15:59 12 2011-05-16T16:00 14 2011-05-16T16:01 14 2011-05-16T16:02 13 2011-05-16T16:03 11 2011-05-16T16:04 12 2011-05-16T16:05 11 2011-05-16T16:06 16:06:55 10.103.7.27 root@mtag27:/usr/lib/hbase/logs $ tail -f gc-hbase.log 2011-05-16T16:06:05.528-0700: 1725.183: [GC 1725.183: [ParNew: 76665K->7644K(76672K), 0.0301560 secs] 3255607K->3194064K(12279488K) icms_dc=0 , 0.0302940 secs] [Times: user=0.09 sys=0.01, real=0.03 secs] 2011-05-16T16:06:11.457-0700: 1731.112: [GC 1731.112: [ParNew: 75800K->7168K(76672K), 0.0311670 secs] 3262220K->3198429K(12279488K) icms_dc=0 , 0.0312880 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 2011-05-16T16:06:16.546-0700: 1736.202: [GC 1736.202: [ParNew: 75328K->7674K(76672K), 0.0270760 secs] 3266589K->3203675K(12279488K) icms_dc=0 , 0.0271940 secs] [Times: user=0.09 sys=0.01, real=0.03 secs] 2011-05-16T16:06:21.407-0700: 1741.063: [GC 1741.063: [ParNew: 75832K->8512K(76672K), 0.0285340 secs] 3271833K->3209994K(12279488K) icms_dc=0 , 0.0286380 secs] [Times: user=0.07 sys=0.01, real=0.03 secs] 2011-05-16T16:06:26.571-0700: 1746.226: [GC 1746.226: [ParNew: 76627K->6049K(76672K), 0.0254590 secs] 3278109K->3213405K(12279488K) icms_dc=0 , 0.0255890 secs] [Times: user=0.08 sys=0.00, real=0.03 secs] Now running GC less often with larger cleans. And here is another server that runs under same load: 40 2011-05-16T15:53 50 2011-05-16T15:54 45 2011-05-16T15:55 43 2011-05-16T15:56 39 2011-05-16T15:57 37 2011-05-16T15:58 37 2011-05-16T15:59 35 2011-05-16T16:00 36 2011-05-16T16:01 25 2011-05-16T16:02 16:08:39 10.103.7.25 root@mtag25:/usr/lib/hbase/logs $ tail -f gc-hbase.log 2011-05-16T16:08:29.154-0700: 171956.381: [GC 171956.382: [ParNew: 19130K->2112K(19136K), 0.0403890 secs] 6649843K->6634911K(11885508K), 0.0407000 secs] [Times: user=0.15 sys=0.01, real=0.04 secs] 2011-05-16T16:08:30.237-0700: 171957.464: [GC 171957.465: [ParNew: 19077K->2112K(19136K), 0.0367750 secs] 6651877K->6636722K(11885508K), 0.0369370 secs] [Times: user=0.14 sys=0.00, real=0.04 secs] 2011-05-16T16:08:31.864-0700: 171959.092: [GC 171959.092: [ParNew: 19120K->1656K(19136K), 0.0396040 secs] 6653731K->6637891K(11885508K), 0.0397500 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] 2011-05-16T16:08:33.551-0700: 171960.778: [GC 171960.779: [ParNew: 18620K->2112K(19136K), 0.0398640 secs] 6654855K->6638900K(11885508K), 0.0400910 secs] [Times: user=0.15 sys=0.00, real=0.04 secs] 2011-05-16T16:08:35.116-0700: 171962.344: [GC 171962.344: [ParNew: 19136K->2112K(19136K), 0.0375530 secs] 6655924K->6641070K(11885508K), 0.0377090 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] 2011-05-16T16:08:36.562-0700: 171963.790: [GC 171963.790: [ParNew: 19131K->1763K(19136K), 0.0453660 secs] 6658090K->6642325K(11885508K), 0.0455130 secs] [Times: user=0.17 sys=0.00, real=0.04 secs] 2011-05-16T16:08:37.675-0700: 171964.903: [GC 171964.903: [ParNew: 18733K->1400K(19136K), 0.0457320 secs] 6659295K->6643257K(11885508K), 0.0459240 secs] [Times: user=0.17 sys=0.00, real=0.05 secs] Way more often, less to clean, and CPU usage is visibly higher. And GC config changes are: export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=2G" # Uncomment below to enable java garbage collection logging. export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms12000m -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \ -XX:+CMSIncrementalMode \ <-------- -XX:+CMSIncrementalPacing \ <------- -XX:-TraceClassUnloading <--------------- This way GC statistically adapts, and lesses the load on CPU. -Jack On Mon, May 16, 2011 at 3:17 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > hlog rolled, and qps was at 1000, not much at all. I tend to believe > that concurrent GC collection is taking way too many threads from the
-
Re: GC and High CPUStack 2011-05-16, 23:19
So, the change is that you started using CMS? You were using the
default GC previous? ParNew is much bigger now. St.Ack On Mon, May 16, 2011 at 4:11 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > I think this will resolve my issue, here is the output: > > 14 2011-05-16T15:58 > 13 2011-05-16T15:59 > 12 2011-05-16T16:00 > 14 2011-05-16T16:01 > 14 2011-05-16T16:02 > 13 2011-05-16T16:03 > 11 2011-05-16T16:04 > 12 2011-05-16T16:05 > 11 2011-05-16T16:06 > 16:06:55 10.103.7.27 root@mtag27:/usr/lib/hbase/logs $ tail -f gc-hbase.log > 2011-05-16T16:06:05.528-0700: 1725.183: [GC 1725.183: [ParNew: > 76665K->7644K(76672K), 0.0301560 secs] 3255607K->3194064K(12279488K) > icms_dc=0 , 0.0302940 secs] [Times: user=0.09 sys=0.01, real=0.03 > secs] > 2011-05-16T16:06:11.457-0700: 1731.112: [GC 1731.112: [ParNew: > 75800K->7168K(76672K), 0.0311670 secs] 3262220K->3198429K(12279488K) > icms_dc=0 , 0.0312880 secs] [Times: user=0.09 sys=0.00, real=0.03 > secs] > 2011-05-16T16:06:16.546-0700: 1736.202: [GC 1736.202: [ParNew: > 75328K->7674K(76672K), 0.0270760 secs] 3266589K->3203675K(12279488K) > icms_dc=0 , 0.0271940 secs] [Times: user=0.09 sys=0.01, real=0.03 > secs] > 2011-05-16T16:06:21.407-0700: 1741.063: [GC 1741.063: [ParNew: > 75832K->8512K(76672K), 0.0285340 secs] 3271833K->3209994K(12279488K) > icms_dc=0 , 0.0286380 secs] [Times: user=0.07 sys=0.01, real=0.03 > secs] > 2011-05-16T16:06:26.571-0700: 1746.226: [GC 1746.226: [ParNew: > 76627K->6049K(76672K), 0.0254590 secs] 3278109K->3213405K(12279488K) > icms_dc=0 , 0.0255890 secs] [Times: user=0.08 sys=0.00, real=0.03 > secs] > > > Now running GC less often with larger cleans. And here is another > server that runs under same load: > > 40 2011-05-16T15:53 > 50 2011-05-16T15:54 > 45 2011-05-16T15:55 > 43 2011-05-16T15:56 > 39 2011-05-16T15:57 > 37 2011-05-16T15:58 > 37 2011-05-16T15:59 > 35 2011-05-16T16:00 > 36 2011-05-16T16:01 > 25 2011-05-16T16:02 > > > 16:08:39 10.103.7.25 root@mtag25:/usr/lib/hbase/logs $ tail -f gc-hbase.log > 2011-05-16T16:08:29.154-0700: 171956.381: [GC 171956.382: [ParNew: > 19130K->2112K(19136K), 0.0403890 secs] 6649843K->6634911K(11885508K), > 0.0407000 secs] [Times: user=0.15 sys=0.01, real=0.04 secs] > 2011-05-16T16:08:30.237-0700: 171957.464: [GC 171957.465: [ParNew: > 19077K->2112K(19136K), 0.0367750 secs] 6651877K->6636722K(11885508K), > 0.0369370 secs] [Times: user=0.14 sys=0.00, real=0.04 secs] > 2011-05-16T16:08:31.864-0700: 171959.092: [GC 171959.092: [ParNew: > 19120K->1656K(19136K), 0.0396040 secs] 6653731K->6637891K(11885508K), > 0.0397500 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] > 2011-05-16T16:08:33.551-0700: 171960.778: [GC 171960.779: [ParNew: > 18620K->2112K(19136K), 0.0398640 secs] 6654855K->6638900K(11885508K), > 0.0400910 secs] [Times: user=0.15 sys=0.00, real=0.04 secs] > 2011-05-16T16:08:35.116-0700: 171962.344: [GC 171962.344: [ParNew: > 19136K->2112K(19136K), 0.0375530 secs] 6655924K->6641070K(11885508K), > 0.0377090 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] > 2011-05-16T16:08:36.562-0700: 171963.790: [GC 171963.790: [ParNew: > 19131K->1763K(19136K), 0.0453660 secs] 6658090K->6642325K(11885508K), > 0.0455130 secs] [Times: user=0.17 sys=0.00, real=0.04 secs] > 2011-05-16T16:08:37.675-0700: 171964.903: [GC 171964.903: [ParNew: > 18733K->1400K(19136K), 0.0457320 secs] 6659295K->6643257K(11885508K), > 0.0459240 secs] [Times: user=0.17 sys=0.00, real=0.05 secs] > > > Way more often, less to clean, and CPU usage is visibly higher. And > GC config changes are: > > > export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC > -XX:MaxDirectMemorySize=2G" > > # Uncomment below to enable java garbage collection logging. > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms12000m > -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError > -Xloggc:$HBASE_HOME/logs/gc-hbase.log \ > -XX:+CMSIncrementalMode \ <--------
-
Re: GC and High CPUJack Levin 2011-05-17, 00:06
Those are the lines I added:
-XX:+CMSIncrementalMode \ <-------- -XX:+CMSIncrementalPacing \ <------- -XX:-TraceClassUnloading <------ -Jack (used CMS before) On Mon, May 16, 2011 at 4:19 PM, Stack <[EMAIL PROTECTED]> wrote: > So, the change is that you started using CMS? You were using the > default GC previous? > > ParNew is much bigger now. > > St.Ack > > On Mon, May 16, 2011 at 4:11 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> I think this will resolve my issue, here is the output: >> >> 14 2011-05-16T15:58 >> 13 2011-05-16T15:59 >> 12 2011-05-16T16:00 >> 14 2011-05-16T16:01 >> 14 2011-05-16T16:02 >> 13 2011-05-16T16:03 >> 11 2011-05-16T16:04 >> 12 2011-05-16T16:05 >> 11 2011-05-16T16:06 >> 16:06:55 10.103.7.27 root@mtag27:/usr/lib/hbase/logs $ tail -f gc-hbase.log >> 2011-05-16T16:06:05.528-0700: 1725.183: [GC 1725.183: [ParNew: >> 76665K->7644K(76672K), 0.0301560 secs] 3255607K->3194064K(12279488K) >> icms_dc=0 , 0.0302940 secs] [Times: user=0.09 sys=0.01, real=0.03 >> secs] >> 2011-05-16T16:06:11.457-0700: 1731.112: [GC 1731.112: [ParNew: >> 75800K->7168K(76672K), 0.0311670 secs] 3262220K->3198429K(12279488K) >> icms_dc=0 , 0.0312880 secs] [Times: user=0.09 sys=0.00, real=0.03 >> secs] >> 2011-05-16T16:06:16.546-0700: 1736.202: [GC 1736.202: [ParNew: >> 75328K->7674K(76672K), 0.0270760 secs] 3266589K->3203675K(12279488K) >> icms_dc=0 , 0.0271940 secs] [Times: user=0.09 sys=0.01, real=0.03 >> secs] >> 2011-05-16T16:06:21.407-0700: 1741.063: [GC 1741.063: [ParNew: >> 75832K->8512K(76672K), 0.0285340 secs] 3271833K->3209994K(12279488K) >> icms_dc=0 , 0.0286380 secs] [Times: user=0.07 sys=0.01, real=0.03 >> secs] >> 2011-05-16T16:06:26.571-0700: 1746.226: [GC 1746.226: [ParNew: >> 76627K->6049K(76672K), 0.0254590 secs] 3278109K->3213405K(12279488K) >> icms_dc=0 , 0.0255890 secs] [Times: user=0.08 sys=0.00, real=0.03 >> secs] >> >> >> Now running GC less often with larger cleans. And here is another >> server that runs under same load: >> >> 40 2011-05-16T15:53 >> 50 2011-05-16T15:54 >> 45 2011-05-16T15:55 >> 43 2011-05-16T15:56 >> 39 2011-05-16T15:57 >> 37 2011-05-16T15:58 >> 37 2011-05-16T15:59 >> 35 2011-05-16T16:00 >> 36 2011-05-16T16:01 >> 25 2011-05-16T16:02 >> >> >> 16:08:39 10.103.7.25 root@mtag25:/usr/lib/hbase/logs $ tail -f gc-hbase.log >> 2011-05-16T16:08:29.154-0700: 171956.381: [GC 171956.382: [ParNew: >> 19130K->2112K(19136K), 0.0403890 secs] 6649843K->6634911K(11885508K), >> 0.0407000 secs] [Times: user=0.15 sys=0.01, real=0.04 secs] >> 2011-05-16T16:08:30.237-0700: 171957.464: [GC 171957.465: [ParNew: >> 19077K->2112K(19136K), 0.0367750 secs] 6651877K->6636722K(11885508K), >> 0.0369370 secs] [Times: user=0.14 sys=0.00, real=0.04 secs] >> 2011-05-16T16:08:31.864-0700: 171959.092: [GC 171959.092: [ParNew: >> 19120K->1656K(19136K), 0.0396040 secs] 6653731K->6637891K(11885508K), >> 0.0397500 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] >> 2011-05-16T16:08:33.551-0700: 171960.778: [GC 171960.779: [ParNew: >> 18620K->2112K(19136K), 0.0398640 secs] 6654855K->6638900K(11885508K), >> 0.0400910 secs] [Times: user=0.15 sys=0.00, real=0.04 secs] >> 2011-05-16T16:08:35.116-0700: 171962.344: [GC 171962.344: [ParNew: >> 19136K->2112K(19136K), 0.0375530 secs] 6655924K->6641070K(11885508K), >> 0.0377090 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] >> 2011-05-16T16:08:36.562-0700: 171963.790: [GC 171963.790: [ParNew: >> 19131K->1763K(19136K), 0.0453660 secs] 6658090K->6642325K(11885508K), >> 0.0455130 secs] [Times: user=0.17 sys=0.00, real=0.04 secs] >> 2011-05-16T16:08:37.675-0700: 171964.903: [GC 171964.903: [ParNew: >> 18733K->1400K(19136K), 0.0457320 secs] 6659295K->6643257K(11885508K), >> 0.0459240 secs] [Times: user=0.17 sys=0.00, real=0.05 secs] >> >> >> Way more often, less to clean, and CPU usage is visibly higher. And >> GC config changes are: >> >> >> export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC >> -XX:MaxDirectMemorySize=2G"
-
Re: GC and High CPUAndrew Purtell 2011-05-17, 02:41
This is interesting because our "conventional wisdom" is those settings should increase the chance of stop-the-world GC and should be avoided.
- Andy (who always gets nervous when we start talking about GC black magic) > From: Jack Levin <[EMAIL PROTECTED]> > Subject: Re: GC and High CPU > To: [EMAIL PROTECTED] > Date: Monday, May 16, 2011, 5:06 PM > > Those are the lines I added: > > -XX:+CMSIncrementalMode \ <-------- > -XX:+CMSIncrementalPacing \ <------- > -XX:-TraceClassUnloading <------ > > -Jack > (used CMS before) > > On Mon, May 16, 2011 at 4:19 PM, Stack <[EMAIL PROTECTED]> > wrote: > > So, the change is that you started using CMS? You > > were using the default GC previous? > > > > ParNew is much bigger now.
-
Re: GC and High CPUJack Levin 2011-05-17, 03:30
I think in our case we have a deadlock of not cleaning garbage in large
enough chunks; being stuck in high cpu is as good as being dead Jack On May 16, 2011 4:41 PM, "Andrew Purtell" <[EMAIL PROTECTED]> wrote: > This is interesting because our "conventional wisdom" is those settings should increase the chance of stop-the-world GC and should be avoided. > > - Andy > (who always gets nervous when we start talking about GC black magic) > >> From: Jack Levin <[EMAIL PROTECTED]> >> Subject: Re: GC and High CPU >> To: [EMAIL PROTECTED] >> Date: Monday, May 16, 2011, 5:06 PM >> >> Those are the lines I added: >> >> -XX:+CMSIncrementalMode \ <-------- >> -XX:+CMSIncrementalPacing \ <------- >> -XX:-TraceClassUnloading <------ >> >> -Jack >> (used CMS before) >> >> On Mon, May 16, 2011 at 4:19 PM, Stack <[EMAIL PROTECTED]> >> wrote: >> > So, the change is that you started using CMS? You >> > were using the default GC previous? >> > >> > ParNew is much bigger now. >
-
Re: GC and High CPUStack 2011-05-17, 03:52
I don't understand what of the below made a difference though the
difference is plain from the GC logs you show. See below: On Mon, May 16, 2011 at 5:06 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > Those are the lines I added: > > -XX:+CMSIncrementalMode \ <-------- >From the doc., it says about i-cms and java6 "This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or 2)." [See http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms] Don't you have > 2 processors per machine? St.Ack
-
Re: GC and High CPUJack Levin 2011-05-17, 04:55
This is the way I read it. "Low processors" == "high CPU tasks, e.g. high
load". So, Incremental takes GC down a number of notches when it comes to competing with CPU for APP threads. That being the case the deadlock is less likely. It would be useful to add code to the RS that will start blocking any RPC calls should this condition be detected, if we block say for 10 seconds, GC could finish doing it 'dirty' work and release CPU :) -Jack On May 16, 2011 5:53 PM, "Stack" <[EMAIL PROTECTED]> wrote: > I don't understand what of the below made a difference though the > difference is plain from the GC logs you show. > > See below: > > On Mon, May 16, 2011 at 5:06 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> Those are the lines I added: >> >> -XX:+CMSIncrementalMode \ <-------- > > > From the doc., it says about i-cms and java6 "This feature is useful > when applications that need the low pause times provided by the > concurrent collector are run on machines with small numbers of > processors (e.g., 1 or 2)." [See > http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms ] > Don't you have > 2 processors per machine? > > St.Ack
-
Re: GC and High CPUMatt Corgan 2011-05-17, 14:27
Isn't a 20mb young generation pretty small these days? For a 4gb heap, you
might want to try giving 5-10% of it to the young generation by setting -XX:NewSize=256M Some of the older docs<http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#1.1.Sizing%20the%20Generations%7Coutline>have a ratio of 1/3 between young/old, so that would be 1gb young, but that is probably bigger than you need. I'm far from an expert though... what size do other people use? Matt On Tue, May 17, 2011 at 12:55 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > This is the way I read it. "Low processors" == "high CPU tasks, e.g. high > load". So, Incremental takes GC down a number of notches when it comes to > competing with CPU for APP threads. That being the case the deadlock is > less likely. It would be useful to add code to the RS that will start > blocking any RPC calls should this condition be detected, if we block say > for 10 seconds, GC could finish doing it 'dirty' work and release CPU :) > > > -Jack > > > On May 16, 2011 5:53 PM, "Stack" <[EMAIL PROTECTED]> wrote: > > I don't understand what of the below made a difference though the > > difference is plain from the GC logs you show. > > > > See below: > > > > On Mon, May 16, 2011 at 5:06 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> Those are the lines I added: > >> > >> -XX:+CMSIncrementalMode \ <-------- > > > > > > From the doc., it says about i-cms and java6 "This feature is useful > > when applications that need the low pause times provided by the > > concurrent collector are run on machines with small numbers of > > processors (e.g., 1 or 2)." [See > > > http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms > ] > > Don't you have > 2 processors per machine? > > > > St.Ack >
-
Re: GC and High CPUTodd Lipcon 2011-05-17, 22:39
My rule of thumb is "as big as possible but no bigger". The young gen size
determines the pause time and frequency of your young generation collections, which are stop-the-world. In my experience the speed is about 0.5-1 second per GB on modern hardware. You get better throughput and less fragmentatoin with a larger young genreation, but the pause times go up too. So, if you have a throughput oriented app, sure, let the JVM use its adaptive sizing - otherwise, constrain based on your latency SLAs. -Todd On Tue, May 17, 2011 at 7:27 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > Isn't a 20mb young generation pretty small these days? For a 4gb heap, you > might want to try giving 5-10% of it to the young generation by setting > -XX:NewSize=256M > > Some of the older > docs< > http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#1.1.Sizing%20the%20Generations%7Coutline > >have > a ratio of 1/3 between young/old, so that would be 1gb young, but that > is probably bigger than you need. I'm far from an expert though... what > size do other people use? > > Matt > > > On Tue, May 17, 2011 at 12:55 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > This is the way I read it. "Low processors" == "high CPU tasks, e.g. high > > load". So, Incremental takes GC down a number of notches when it comes > to > > competing with CPU for APP threads. That being the case the deadlock is > > less likely. It would be useful to add code to the RS that will start > > blocking any RPC calls should this condition be detected, if we block say > > for 10 seconds, GC could finish doing it 'dirty' work and release CPU :) > > > > > > -Jack > > > > > > On May 16, 2011 5:53 PM, "Stack" <[EMAIL PROTECTED]> wrote: > > > I don't understand what of the below made a difference though the > > > difference is plain from the GC logs you show. > > > > > > See below: > > > > > > On Mon, May 16, 2011 at 5:06 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > > >> Those are the lines I added: > > >> > > >> -XX:+CMSIncrementalMode \ <-------- > > > > > > > > > From the doc., it says about i-cms and java6 "This feature is useful > > > when applications that need the low pause times provided by the > > > concurrent collector are run on machines with small numbers of > > > processors (e.g., 1 or 2)." [See > > > > > > http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms > > ] > > > Don't you have > 2 processors per machine? > > > > > > St.Ack > > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: GC and High CPUJack Levin 2011-05-18, 00:09
Well, our issue was that we take an 8 core box, doing 18MB ParNew GC,
do it super fast (10x1sec), receive RPC calls to RS at the same time, pile them up due to CPU contention, do GC even faster, run out of user CPU. This is basically the cycle of cpu death. It seems that larger RAM boxes doing this: 4 2011-05-17T16:52 4 2011-05-17T16:53 4 2011-05-17T16:54 4 2011-05-17T16:55 4 times per minute, is just great for me: [ParNew: 191744K->21248K(191744K), 0.0361360 secs] -Jack -jack On Tue, May 17, 2011 at 3:39 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > My rule of thumb is "as big as possible but no bigger". The young gen size > determines the pause time and frequency of your young generation > collections, which are stop-the-world. In my experience the speed is about > 0.5-1 second per GB on modern hardware. > > You get better throughput and less fragmentatoin with a larger young > genreation, but the pause times go up too. > > So, if you have a throughput oriented app, sure, let the JVM use its > adaptive sizing - otherwise, constrain based on your latency SLAs. > > -Todd > > On Tue, May 17, 2011 at 7:27 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > >> Isn't a 20mb young generation pretty small these days? For a 4gb heap, you >> might want to try giving 5-10% of it to the young generation by setting >> -XX:NewSize=256M >> >> Some of the older >> docs< >> http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#1.1.Sizing%20the%20Generations%7Coutline >> >have >> a ratio of 1/3 between young/old, so that would be 1gb young, but that >> is probably bigger than you need. I'm far from an expert though... what >> size do other people use? >> >> Matt >> >> >> On Tue, May 17, 2011 at 12:55 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> >> > This is the way I read it. "Low processors" == "high CPU tasks, e.g. high >> > load". So, Incremental takes GC down a number of notches when it comes >> to >> > competing with CPU for APP threads. That being the case the deadlock is >> > less likely. It would be useful to add code to the RS that will start >> > blocking any RPC calls should this condition be detected, if we block say >> > for 10 seconds, GC could finish doing it 'dirty' work and release CPU :) >> > >> > >> > -Jack >> > >> > >> > On May 16, 2011 5:53 PM, "Stack" <[EMAIL PROTECTED]> wrote: >> > > I don't understand what of the below made a difference though the >> > > difference is plain from the GC logs you show. >> > > >> > > See below: >> > > >> > > On Mon, May 16, 2011 at 5:06 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> > >> Those are the lines I added: >> > >> >> > >> -XX:+CMSIncrementalMode \ <-------- >> > > >> > > >> > > From the doc., it says about i-cms and java6 "This feature is useful >> > > when applications that need the low pause times provided by the >> > > concurrent collector are run on machines with small numbers of >> > > processors (e.g., 1 or 2)." [See >> > > >> > >> http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms >> > ] >> > > Don't you have > 2 processors per machine? >> > > >> > > St.Ack >> > >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
-
Re: GC and High CPUJack Levin 2011-05-19, 22:46
so far its going well, no more crazy GC, while load remains the same.
2011-05-18T11:01:53.149-0700: 52288.388: [CMS-concurrent-preclean: 5.764/216.720 secs] [Times: user=103.71 sys=15.85, real=216.68 secs] 2011-05-18T11:01:53.149-0700: 52288.388: [CMS-concurrent-abortable-preclean-start] 2011-05-18T11:01:53.270-0700: 52288.509: [CMS-concurrent-abortable-preclean: 0.117/0.121 secs] [Times: user=0.13 sys=0.01, real=0.13 secs] 2011-05-18T11:01:53.271-0700: 52288.510: [GC[YG occupancy: 104181 K (191744 K)]52288.510: [Rescan (parallel) , 0.0217820 secs]52288.532: [weak refs processing, 0.0010760 secs] [1 CMS-remark: 11810687K(12075008K)] 11914869K(12266752K), 0.0230420 secs] [Times: user=0.19 sys=0.00, real=0.02 secs] 2011-05-18T11:01:53.294-0700: 52288.534: [CMS-concurrent-sweep-start] preclean took a long time, but other then that, we remain super stable, and using HEAP to the max. -Jack On Mon, May 16, 2011 at 7:41 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > This is interesting because our "conventional wisdom" is those settings should increase the chance of stop-the-world GC and should be avoided. > > - Andy > (who always gets nervous when we start talking about GC black magic) > >> From: Jack Levin <[EMAIL PROTECTED]> >> Subject: Re: GC and High CPU >> To: [EMAIL PROTECTED] >> Date: Monday, May 16, 2011, 5:06 PM >> >> Those are the lines I added: >> >> -XX:+CMSIncrementalMode \ <-------- >> -XX:+CMSIncrementalPacing \ <------- >> -XX:-TraceClassUnloading <------ >> >> -Jack >> (used CMS before) >> >> On Mon, May 16, 2011 at 4:19 PM, Stack <[EMAIL PROTECTED]> >> wrote: >> > So, the change is that you started using CMS? You >> > were using the default GC previous? >> > >> > ParNew is much bigger now. > > |