|
|
-
Re: Consumers constantly rebalancing
Neha Narkhede 2013-02-06, 01:47
>> Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed has expired, closing socket connection
This can happen either due to long GC pauses on your client side or due to IO pauses on the zookeeper server side. That is the reason increasing the session timeout seems to have helped. If this error happens frequently, it will cause your consumer instances to keep rebalancing.
Thanks, Neha On Tue, Feb 5, 2013 at 5:41 PM, Manish Khettry <[EMAIL PROTECTED]> wrote:
> We are trying to trouble shoot a problem wherein our system just cannot > seem to read messages fast enough from Kafka. We are on kafka 0.6 and are > using the simple consumer. > > From looking at the logs, and we see a lot (almost constant chatty > messages) about rebalancing. So for instance every minute, we see messages > like this: > > > Consumer rookery-vacuum-prod_<first_ip>.internal-1360106018385 > rebalancing the following partitions: List(0-0, 0-1, 0-10, 0-11, 0-12, > 0-13, 0-14, 0-15, 0-16, 0-17, 0-18, 0-19, 0-2, 0-3, 0-4, 0-5, 0-6, > 0-7, 0-8, 0-9, 1-0, 1-1, 1-10, 1-11, 1-12, 1-13, 1-14, 1-15, 1-16, > 1-17, 1-18, 1-19, 1-2, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9) for topic > compact-player-logs with consumers: > > > I also see zookeeper timeouts like so: > > Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed > has expired, closing socket connection > > > We increased the zookeeper session timeout from 6 seconds to 12 seconds and > this seems to have helped somewhat but I'm not sure if these zookeeper > timeouts at 6 seconds are symptomatic of a problem with our zookeeper > cluster and/or connectivity between the consumers and zk. Any thoughts? > > Manish >
+
Neha Narkhede 2013-02-06, 01:47
-
Re: Consumers constantly rebalancing
Jay Kreps 2013-02-06, 04:28
The easiest way to diagnose is to enable GC logging on both the consumer and the zk instance and see if you have long pauses.
-Jay On Tue, Feb 5, 2013 at 5:46 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:
> >> Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed > has expired, closing socket connection > > This can happen either due to long GC pauses on your client side or due to > IO pauses on the zookeeper server side. > That is the reason increasing the session timeout seems to have helped. > If this error happens frequently, it will cause your consumer instances to > keep rebalancing. > > Thanks, > Neha > > > On Tue, Feb 5, 2013 at 5:41 PM, Manish Khettry <[EMAIL PROTECTED]> wrote: > > > We are trying to trouble shoot a problem wherein our system just cannot > > seem to read messages fast enough from Kafka. We are on kafka 0.6 and are > > using the simple consumer. > > > > From looking at the logs, and we see a lot (almost constant chatty > > messages) about rebalancing. So for instance every minute, we see > messages > > like this: > > > > > > Consumer rookery-vacuum-prod_<first_ip>.internal-1360106018385 > > rebalancing the following partitions: List(0-0, 0-1, 0-10, 0-11, 0-12, > > 0-13, 0-14, 0-15, 0-16, 0-17, 0-18, 0-19, 0-2, 0-3, 0-4, 0-5, 0-6, > > 0-7, 0-8, 0-9, 1-0, 1-1, 1-10, 1-11, 1-12, 1-13, 1-14, 1-15, 1-16, > > 1-17, 1-18, 1-19, 1-2, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9) for topic > > compact-player-logs with consumers: > > > > > > I also see zookeeper timeouts like so: > > > > Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed > > has expired, closing socket connection > > > > > > We increased the zookeeper session timeout from 6 seconds to 12 seconds > and > > this seems to have helped somewhat but I'm not sure if these zookeeper > > timeouts at 6 seconds are symptomatic of a problem with our zookeeper > > cluster and/or connectivity between the consumers and zk. Any thoughts? > > > > Manish > > >
+
Jay Kreps 2013-02-06, 04:28
-
Re: Consumers constantly rebalancing
Manish Khettry 2013-02-06, 05:04
Definitely no long pauses on the consumer. I see a minor collection every second which uses up 0.1 or 0.2 seconds. That in itself seems a bit on the higher side (~10-20% time spent in GC) but I don't think that would cause a zk session timeout. Now getting gc stats on the zookeeper side is a bit harder-- this is not a system we control!
So in your opinion, long gc pauses are the most likely explanation for this.
m On Tue, Feb 5, 2013 at 8:27 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> The easiest way to diagnose is to enable GC logging on both the consumer > and the zk instance and see if you have long pauses. > > -Jay > > > On Tue, Feb 5, 2013 at 5:46 PM, Neha Narkhede <[EMAIL PROTECTED] > >wrote: > > > >> Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed > > has expired, closing socket connection > > > > This can happen either due to long GC pauses on your client side or due > to > > IO pauses on the zookeeper server side. > > That is the reason increasing the session timeout seems to have helped. > > If this error happens frequently, it will cause your consumer instances > to > > keep rebalancing. > > > > Thanks, > > Neha > > > > > > On Tue, Feb 5, 2013 at 5:41 PM, Manish Khettry <[EMAIL PROTECTED]> > wrote: > > > > > We are trying to trouble shoot a problem wherein our system just cannot > > > seem to read messages fast enough from Kafka. We are on kafka 0.6 and > are > > > using the simple consumer. > > > > > > From looking at the logs, and we see a lot (almost constant chatty > > > messages) about rebalancing. So for instance every minute, we see > > messages > > > like this: > > > > > > > > > Consumer rookery-vacuum-prod_<first_ip>.internal-1360106018385 > > > rebalancing the following partitions: List(0-0, 0-1, 0-10, 0-11, 0-12, > > > 0-13, 0-14, 0-15, 0-16, 0-17, 0-18, 0-19, 0-2, 0-3, 0-4, 0-5, 0-6, > > > 0-7, 0-8, 0-9, 1-0, 1-1, 1-10, 1-11, 1-12, 1-13, 1-14, 1-15, 1-16, > > > 1-17, 1-18, 1-19, 1-2, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9) for topic > > > compact-player-logs with consumers: > > > > > > > > > I also see zookeeper timeouts like so: > > > > > > Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed > > > has expired, closing socket connection > > > > > > > > > We increased the zookeeper session timeout from 6 seconds to 12 seconds > > and > > > this seems to have helped somewhat but I'm not sure if these zookeeper > > > timeouts at 6 seconds are symptomatic of a problem with our zookeeper > > > cluster and/or connectivity between the consumers and zk. Any thoughts? > > > > > > Manish > > > > > >
+
Manish Khettry 2013-02-06, 05:04
|
|