The easiest way to diagnose is to enable GC logging on both the consumer
and the zk instance and see if you have long pauses.
On Tue, Feb 5, 2013 at 5:46 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:
> >> Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed
> has expired, closing socket connection
> This can happen either due to long GC pauses on your client side or due to
> IO pauses on the zookeeper server side.
> That is the reason increasing the session timeout seems to have helped.
> If this error happens frequently, it will cause your consumer instances to
> keep rebalancing.
> On Tue, Feb 5, 2013 at 5:41 PM, Manish Khettry <[EMAIL PROTECTED]> wrote:
> > We are trying to trouble shoot a problem wherein our system just cannot
> > seem to read messages fast enough from Kafka. We are on kafka 0.6 and are
> > using the simple consumer.
> > From looking at the logs, and we see a lot (almost constant chatty
> > messages) about rebalancing. So for instance every minute, we see
> > like this:
> > Consumer rookery-vacuum-prod_<first_ip>.internal-1360106018385
> > rebalancing the following partitions: List(0-0, 0-1, 0-10, 0-11, 0-12,
> > 0-13, 0-14, 0-15, 0-16, 0-17, 0-18, 0-19, 0-2, 0-3, 0-4, 0-5, 0-6,
> > 0-7, 0-8, 0-9, 1-0, 1-1, 1-10, 1-11, 1-12, 1-13, 1-14, 1-15, 1-16,
> > 1-17, 1-18, 1-19, 1-2, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9) for topic
> > compact-player-logs with consumers:
> > I also see zookeeper timeouts like so:
> > Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed
> > has expired, closing socket connection
> > We increased the zookeeper session timeout from 6 seconds to 12 seconds
> > this seems to have helped somewhat but I'm not sure if these zookeeper
> > timeouts at 6 seconds are symptomatic of a problem with our zookeeper
> > cluster and/or connectivity between the consumers and zk. Any thoughts?
> > Manish