Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Consumers constantly rebalancing


Copy link to this message
-
Re: Consumers constantly rebalancing
Definitely no long pauses on the consumer. I see a minor collection every
second which uses up 0.1 or 0.2 seconds. That in itself seems a bit on the
higher side (~10-20% time spent in GC) but I don't think that would cause a
zk session timeout. Now getting gc stats on the zookeeper side is a bit
harder-- this is not a system we control!

So in your opinion, long gc pauses are the most likely explanation for this.

m
On Tue, Feb 5, 2013 at 8:27 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> The easiest way to diagnose is to enable GC logging on both the consumer
> and the zk instance and see if you have long pauses.
>
> -Jay
>
>
> On Tue, Feb 5, 2013 at 5:46 PM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > >> Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed
> > has expired, closing socket connection
> >
> > This can happen either due to long GC pauses on your client side or due
> to
> > IO pauses on the zookeeper server side.
> > That is the reason increasing the session timeout seems to have helped.
> > If this error happens frequently, it will cause your consumer instances
> to
> > keep rebalancing.
> >
> > Thanks,
> > Neha
> >
> >
> > On Tue, Feb 5, 2013 at 5:41 PM, Manish Khettry <[EMAIL PROTECTED]>
> wrote:
> >
> > > We are trying to trouble shoot a problem wherein our system just cannot
> > > seem to read messages fast enough from Kafka. We are on kafka 0.6 and
> are
> > > using the simple consumer.
> > >
> > > From looking at the logs, and we see a lot (almost constant chatty
> > > messages) about rebalancing. So for instance every minute, we see
> > messages
> > > like this:
> > >
> > >
> > > Consumer rookery-vacuum-prod_<first_ip>.internal-1360106018385
> > > rebalancing the following partitions: List(0-0, 0-1, 0-10, 0-11, 0-12,
> > > 0-13, 0-14, 0-15, 0-16, 0-17, 0-18, 0-19, 0-2, 0-3, 0-4, 0-5, 0-6,
> > > 0-7, 0-8, 0-9, 1-0, 1-1, 1-10, 1-11, 1-12, 1-13, 1-14, 1-15, 1-16,
> > > 1-17, 1-18, 1-19, 1-2, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9) for topic
> > > compact-player-logs with consumers:
> > >
> > >
> > > I also see zookeeper timeouts like so:
> > >
> > > Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed
> > > has expired, closing socket connection
> > >
> > >
> > > We increased the zookeeper session timeout from 6 seconds to 12 seconds
> > and
> > > this seems to have helped somewhat but I'm not sure  if these zookeeper
> > > timeouts at 6 seconds are symptomatic of a problem with our zookeeper
> > > cluster and/or connectivity between the consumers and zk. Any thoughts?
> > >
> > > Manish
> > >
> >
>