|
|
Varun Sharma 2013-01-24, 17:40
Hi,
I have a region server which has the following logs. As you can see from the log, ParNew is sufficiently big (450M) and there are heavy writes going in. I am seeing 200ms pauses which eventually build up and there is a promotion failure. There is a parnew collection every 2-3 seconds so it fills up real fast. My memstore size is bigger 512m for flushes and 4 regions per server. (overall size is 3G for all memstores) - I have mslab enabled
2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew: 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K), 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs] 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew: 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K), 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs] 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew: 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K), 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs] 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark: 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02 sys=0.00, real=0.04 secs] 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start] 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew: 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K), 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs] 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: 4.105/4.268 secs] [Times: user=5.36 sys=0.34, real=4.27 secs] 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start] 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean: 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs] 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-abortable-preclean-start] 2013-01-24T13:08:27.401+0000: 63544.495: [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46 sys=0.16, real=0.80 secs] 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K (471872 K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)] 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06 secs] 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start] 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew: 471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K), 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs] 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew: 445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K), 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs] 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew: 471872K->52416K(471872K), 0.2300560 secs] 8247804K->7976043K(12727104K), 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs] 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew (promotion failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217: [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep: 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
What might be advised here - should I up the size to 1G for the new generation ?
Thanks Varun
varun kumar 2013-01-24, 19:22
Hi Varun,
try to increase the heap memory.
Regards, Varun Kumar
On Thu, Jan 24, 2013 at 11:10 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
> Hi, > > I have a region server which has the following logs. As you can see from > the log, ParNew is sufficiently big (450M) and there are heavy writes going > in. I am seeing 200ms pauses which eventually build up and there is a > promotion failure. There is a parnew collection every 2-3 seconds so it > fills up real fast. My memstore size is bigger 512m for flushes and 4 > regions per server. (overall size is 3G for all memstores) - I have mslab > enabled > > 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew: > 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K), > 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs] > 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew: > 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K), > 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs] > 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew: > 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K), > 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs] > 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark: > 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02 > sys=0.00, real=0.04 secs] > 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start] > 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew: > 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K), > 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs] > 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: 4.105/4.268 > secs] [Times: user=5.36 sys=0.34, real=4.27 secs] > 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start] > 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean: > 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs] > 2013-01-24T13:08:26.597+0000: 63543.691: > [CMS-concurrent-abortable-preclean-start] > 2013-01-24T13:08:27.401+0000: 63544.495: > [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46 > sys=0.16, real=0.80 secs] > 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K (471872 > K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs > processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)] > 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06 > secs] > 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start] > 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew: > 471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K), > 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs] > 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew: > 445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K), > 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs] > 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew: > 471872K->52416K(471872K), 0.2300560 secs] 8247804K->7976043K(12727104K), > 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs] > 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew (promotion > failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217: > [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep: > 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*] > > What might be advised here - should I up the size to 1G for the new > generation ? > > Thanks > Varun >
-- Regards, Varun Kumar.P
Hi Varun,
Please note if you try to increase new generation size, then the "ParNew" time will be up accordingly, and CMS YGC is also a STW. could you have a try to reduce memstore size to a smaller value, e.g. 128m or 256m ?
Regards, Liang ________________________________________ 发件人: Varun Sharma [[EMAIL PROTECTED]] 发送时间: 2013年1月25日 1:40 收件人: [EMAIL PROTECTED] 主题: GC pause issues
Hi,
I have a region server which has the following logs. As you can see from the log, ParNew is sufficiently big (450M) and there are heavy writes going in. I am seeing 200ms pauses which eventually build up and there is a promotion failure. There is a parnew collection every 2-3 seconds so it fills up real fast. My memstore size is bigger 512m for flushes and 4 regions per server. (overall size is 3G for all memstores) - I have mslab enabled
2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew: 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K), 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs] 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew: 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K), 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs] 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew: 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K), 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs] 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark: 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02 sys=0.00, real=0.04 secs] 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start] 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew: 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K), 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs] 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: 4.105/4.268 secs] [Times: user=5.36 sys=0.34, real=4.27 secs] 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start] 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean: 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs] 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-abortable-preclean-start] 2013-01-24T13:08:27.401+0000: 63544.495: [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46 sys=0.16, real=0.80 secs] 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K (471872 K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)] 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06 secs] 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start] 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew: 471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K), 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs] 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew: 445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K), 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs] 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew: 471872K->52416K(471872K), 0.2300560 secs] 8247804K->7976043K(12727104K), 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs] 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew (promotion failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217: [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep: 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
What might be advised here - should I up the size to 1G for the new generation ?
Thanks Varun
-
Re: 答复: GC pause issues
Varun Sharma 2013-01-25, 02:30
I am curious how reducing the memstore size would help - I have 6 regions and 3G total for memstore - so I would like max out on that by having a bigger flush size per region. Are you asking me to have more regions and smaller memstore flush size instead ? How is that likely to help
On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <[EMAIL PROTECTED]> wrote:
> Hi Varun, > > Please note if you try to increase new generation size, then the "ParNew" > time will be up accordingly, and CMS YGC is also a STW. > could you have a try to reduce memstore size to a smaller value, e.g. 128m > or 256m ? > > Regards, > Liang > ________________________________________ > 发件人: Varun Sharma [[EMAIL PROTECTED]] > 发送时间: 2013年1月25日 1:40 > 收件人: [EMAIL PROTECTED] > 主题: GC pause issues > > Hi, > > I have a region server which has the following logs. As you can see from > the log, ParNew is sufficiently big (450M) and there are heavy writes going > in. I am seeing 200ms pauses which eventually build up and there is a > promotion failure. There is a parnew collection every 2-3 seconds so it > fills up real fast. My memstore size is bigger 512m for flushes and 4 > regions per server. (overall size is 3G for all memstores) - I have mslab > enabled > > 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew: > 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K), > 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs] > 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew: > 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K), > 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs] > 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew: > 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K), > 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs] > 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark: > 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02 > sys=0.00, real=0.04 secs] > 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start] > 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew: > 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K), > 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs] > 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: 4.105/4.268 > secs] [Times: user=5.36 sys=0.34, real=4.27 secs] > 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start] > 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean: > 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs] > 2013-01-24T13:08:26.597+0000: 63543.691: > [CMS-concurrent-abortable-preclean-start] > 2013-01-24T13:08:27.401+0000: 63544.495: > [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46 > sys=0.16, real=0.80 secs] > 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K (471872 > K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs > processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)] > 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06 > secs] > 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start] > 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew: > 471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K), > 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs] > 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew: > 445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K), > 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs] > 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew: > 471872K->52416K(471872K), 0.2300560 secs] 8247804K->7976043K(12727104K), > 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs] > 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew (promotion > failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217: > [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
-
Re: 答复: GC pause issues
Varun Sharma 2013-01-25, 05:02
I do have significant block cache churn and this issue is typical correlated with a huge increase in read latencies - could that be the reason for this - mslab should be taking care of the memstore related heap fragmentation ? Has anyone seen issues with block cache churn ?
On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
> I am curious how reducing the memstore size would help - I have 6 regions > and 3G total for memstore - so I would like max out on that by having a > bigger flush size per region. Are you asking me to have more regions and > smaller memstore flush size instead ? How is that likely to help > > > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <[EMAIL PROTECTED]> wrote: > >> Hi Varun, >> >> Please note if you try to increase new generation size, then the "ParNew" >> time will be up accordingly, and CMS YGC is also a STW. >> could you have a try to reduce memstore size to a smaller value, e.g. >> 128m or 256m ? >> >> Regards, >> Liang >> ________________________________________ >> 发件人: Varun Sharma [[EMAIL PROTECTED]] >> 发送时间: 2013年1月25日 1:40 >> 收件人: [EMAIL PROTECTED] >> 主题: GC pause issues >> >> Hi, >> >> I have a region server which has the following logs. As you can see from >> the log, ParNew is sufficiently big (450M) and there are heavy writes >> going >> in. I am seeing 200ms pauses which eventually build up and there is a >> promotion failure. There is a parnew collection every 2-3 seconds so it >> fills up real fast. My memstore size is bigger 512m for flushes and 4 >> regions per server. (overall size is 3G for all memstores) - I have mslab >> enabled >> >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew: >> 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K), >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs] >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew: >> 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K), >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs] >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew: >> 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K), >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs] >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark: >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: >> user=0.02 >> sys=0.00, real=0.04 secs] >> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start] >> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew: >> 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K), >> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs] >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: 4.105/4.268 >> secs] [Times: user=5.36 sys=0.34, real=4.27 secs] >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start] >> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean: >> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs] >> 2013-01-24T13:08:26.597+0000: 63543.691: >> [CMS-concurrent-abortable-preclean-start] >> 2013-01-24T13:08:27.401+0000: 63544.495: >> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46 >> sys=0.16, real=0.80 secs] >> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K >> (471872 >> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs >> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)] >> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06 >> secs] >> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start] >> 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew: >> 471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K), >> 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs] >> 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew: >> 445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K),
-
Re: 答复: GC pause issues
Jack Levin 2013-01-25, 07:46
Generally, the larger the flush to harder the GC will work. Flush more often to avoid this. What is your total heap size set at? On Jan 24, 2013 9:02 PM, "Varun Sharma" <[EMAIL PROTECTED]> wrote:
> I do have significant block cache churn and this issue is typical > correlated with a huge increase in read latencies - could that be the > reason for this - mslab should be taking care of the memstore related heap > fragmentation ? Has anyone seen issues with block cache churn ? > > On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > > I am curious how reducing the memstore size would help - I have 6 regions > > and 3G total for memstore - so I would like max out on that by having a > > bigger flush size per region. Are you asking me to have more regions and > > smaller memstore flush size instead ? How is that likely to help > > > > > > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <[EMAIL PROTECTED]> wrote: > > > >> Hi Varun, > >> > >> Please note if you try to increase new generation size, then the > "ParNew" > >> time will be up accordingly, and CMS YGC is also a STW. > >> could you have a try to reduce memstore size to a smaller value, e.g. > >> 128m or 256m ? > >> > >> Regards, > >> Liang > >> ________________________________________ > >> 发件人: Varun Sharma [[EMAIL PROTECTED]] > >> 发送时间: 2013年1月25日 1:40 > >> 收件人: [EMAIL PROTECTED] > >> 主题: GC pause issues > >> > >> Hi, > >> > >> I have a region server which has the following logs. As you can see from > >> the log, ParNew is sufficiently big (450M) and there are heavy writes > >> going > >> in. I am seeing 200ms pauses which eventually build up and there is a > >> promotion failure. There is a parnew collection every 2-3 seconds so it > >> fills up real fast. My memstore size is bigger 512m for flushes and 4 > >> regions per server. (overall size is 3G for all memstores) - I have > mslab > >> enabled > >> > >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew: > >> 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K), > >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs] > >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew: > >> 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K), > >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs] > >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew: > >> 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K), > >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs] > >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark: > >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: > >> user=0.02 > >> sys=0.00, real=0.04 secs] > >> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start] > >> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew: > >> 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K), > >> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs] > >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: > 4.105/4.268 > >> secs] [Times: user=5.36 sys=0.34, real=4.27 secs] > >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start] > >> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean: > >> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs] > >> 2013-01-24T13:08:26.597+0000: 63543.691: > >> [CMS-concurrent-abortable-preclean-start] > >> 2013-01-24T13:08:27.401+0000: 63544.495: > >> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46 > >> sys=0.16, real=0.80 secs] > >> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K > >> (471872 > >> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs > >> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)] > >> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, > real=0.06 > >> secs] > >> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
-
Re: 答复: GC pause issues
Varun Sharma 2013-01-25, 17:55
My total heap size is 12G and my young gen is 1G. I am getting 200ms pauses every few seconds - the machine has 4 cores (its m1.xlarge on ec2), if I set the young gen low. The one thing I noted today was that the pauses are spaced @10 seconds until a minor compaction kicks in - and then then the pauses happen every 3-4 seconds. I found this to be highly correlated on a region server. These minor compactions are not huge - they are typically compacting 3 files - upto 200M in total size. They are lasting for 20-30 seconds. Once they are over the GC pauses recover back to the 10 second frequency.
I have the following settings enabled (index.cacheonwrite and bloom.cacheonwrite) to cache the index/bloom blocks upon hfile writes though i find it unlikely that they could be impacting my setup.
On Thu, Jan 24, 2013 at 11:46 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
> Generally, the larger the flush to harder the GC will work. Flush more > often to avoid this. What is your total heap size set at? > On Jan 24, 2013 9:02 PM, "Varun Sharma" <[EMAIL PROTECTED]> wrote: > > > I do have significant block cache churn and this issue is typical > > correlated with a huge increase in read latencies - could that be the > > reason for this - mslab should be taking care of the memstore related > heap > > fragmentation ? Has anyone seen issues with block cache churn ? > > > > On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <[EMAIL PROTECTED]> > wrote: > > > > > I am curious how reducing the memstore size would help - I have 6 > regions > > > and 3G total for memstore - so I would like max out on that by having a > > > bigger flush size per region. Are you asking me to have more regions > and > > > smaller memstore flush size instead ? How is that likely to help > > > > > > > > > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <[EMAIL PROTECTED]> wrote: > > > > > >> Hi Varun, > > >> > > >> Please note if you try to increase new generation size, then the > > "ParNew" > > >> time will be up accordingly, and CMS YGC is also a STW. > > >> could you have a try to reduce memstore size to a smaller value, e.g. > > >> 128m or 256m ? > > >> > > >> Regards, > > >> Liang > > >> ________________________________________ > > >> 发件人: Varun Sharma [[EMAIL PROTECTED]] > > >> 发送时间: 2013年1月25日 1:40 > > >> 收件人: [EMAIL PROTECTED] > > >> 主题: GC pause issues > > >> > > >> Hi, > > >> > > >> I have a region server which has the following logs. As you can see > from > > >> the log, ParNew is sufficiently big (450M) and there are heavy writes > > >> going > > >> in. I am seeing 200ms pauses which eventually build up and there is a > > >> promotion failure. There is a parnew collection every 2-3 seconds so > it > > >> fills up real fast. My memstore size is bigger 512m for flushes and 4 > > >> regions per server. (overall size is 3G for all memstores) - I have > > mslab > > >> enabled > > >> > > >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew: > > >> 471841K->52416K(471872K), 0.2251880 secs] > 8733008K->8445039K(12727104K), > > >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs] > > >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew: > > >> 461593K->52416K(471872K), 0.2812690 secs] > 8854216K->8557572K(12727104K), > > >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs] > > >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew: > > >> 442836K->52416K(471872K), 0.2781490 secs] > 8947992K->8705355K(12727104K), > > >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs] > > >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark: > > >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: > > >> user=0.02 > > >> sys=0.00, real=0.04 secs] > > >> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start] > > >> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew: > > >> 471872K->52251K(471872K), 0.1611970 secs] > 9124811K->8831437K(12727104K), > > >> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
-
Re: 答复: GC pause issues
Jack Levin 2013-01-26, 06:54
How much memory are you flushing? Can you paste the log here? The large the chunk of the flush the longer your GC is going to be.
-Jack
On Fri, Jan 25, 2013 at 9:55 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > My total heap size is 12G and my young gen is 1G. I am getting 200ms pauses > every few seconds - the machine has 4 cores (its m1.xlarge on ec2), if I > set the young gen low. The one thing I noted today was that the pauses are > spaced @10 seconds until a minor compaction kicks in - and then then the > pauses happen every 3-4 seconds. I found this to be highly correlated on a > region server. These minor compactions are not huge - they are typically > compacting 3 files - upto 200M in total size. They are lasting for 20-30 > seconds. Once they are over the GC pauses recover back to the 10 second > frequency. > > I have the following settings enabled (index.cacheonwrite and > bloom.cacheonwrite) to cache the index/bloom blocks upon hfile writes > though i find it unlikely that they could be impacting my setup. > > On Thu, Jan 24, 2013 at 11:46 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> Generally, the larger the flush to harder the GC will work. Flush more >> often to avoid this. What is your total heap size set at? >> On Jan 24, 2013 9:02 PM, "Varun Sharma" <[EMAIL PROTECTED]> wrote: >> >> > I do have significant block cache churn and this issue is typical >> > correlated with a huge increase in read latencies - could that be the >> > reason for this - mslab should be taking care of the memstore related >> heap >> > fragmentation ? Has anyone seen issues with block cache churn ? >> > >> > On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <[EMAIL PROTECTED]> >> wrote: >> > >> > > I am curious how reducing the memstore size would help - I have 6 >> regions >> > > and 3G total for memstore - so I would like max out on that by having a >> > > bigger flush size per region. Are you asking me to have more regions >> and >> > > smaller memstore flush size instead ? How is that likely to help >> > > >> > > >> > > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <[EMAIL PROTECTED]> wrote: >> > > >> > >> Hi Varun, >> > >> >> > >> Please note if you try to increase new generation size, then the >> > "ParNew" >> > >> time will be up accordingly, and CMS YGC is also a STW. >> > >> could you have a try to reduce memstore size to a smaller value, e.g. >> > >> 128m or 256m ? >> > >> >> > >> Regards, >> > >> Liang >> > >> ________________________________________ >> > >> 发件人: Varun Sharma [[EMAIL PROTECTED]] >> > >> 发送时间: 2013年1月25日 1:40 >> > >> 收件人: [EMAIL PROTECTED] >> > >> 主题: GC pause issues >> > >> >> > >> Hi, >> > >> >> > >> I have a region server which has the following logs. As you can see >> from >> > >> the log, ParNew is sufficiently big (450M) and there are heavy writes >> > >> going >> > >> in. I am seeing 200ms pauses which eventually build up and there is a >> > >> promotion failure. There is a parnew collection every 2-3 seconds so >> it >> > >> fills up real fast. My memstore size is bigger 512m for flushes and 4 >> > >> regions per server. (overall size is 3G for all memstores) - I have >> > mslab >> > >> enabled >> > >> >> > >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew: >> > >> 471841K->52416K(471872K), 0.2251880 secs] >> 8733008K->8445039K(12727104K), >> > >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs] >> > >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew: >> > >> 461593K->52416K(471872K), 0.2812690 secs] >> 8854216K->8557572K(12727104K), >> > >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs] >> > >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew: >> > >> 442836K->52416K(471872K), 0.2781490 secs] >> 8947992K->8705355K(12727104K), >> > >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs] >> > >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark: >> > >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
|
|