Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Consumers constantly rebalancing


Copy link to this message
-
Re: Consumers constantly rebalancing
Definitely no long pauses on the consumer. I see a minor collection every
second which uses up 0.1 or 0.2 seconds. That in itself seems a bit on the
higher side (~10-20% time spent in GC) but I don't think that would cause a
zk session timeout. Now getting gc stats on the zookeeper side is a bit
harder-- this is not a system we control!

So in your opinion, long gc pauses are the most likely explanation for this.

m
On Tue, Feb 5, 2013 at 8:27 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> The easiest way to diagnose is to enable GC logging on both the consumer
> and the zk instance and see if you have long pauses.
>
> -Jay
>
>
> On Tue, Feb 5, 2013 at 5:46 PM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > >> Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed
> > has expired, closing socket connection
> >
> > This can happen either due to long GC pauses on your client side or due
> to
> > IO pauses on the zookeeper server side.
> > That is the reason increasing the session timeout seems to have helped.
> > If this error happens frequently, it will cause your consumer instances
> to
> > keep rebalancing.
> >
> > Thanks,
> > Neha
> >
> >
> > On Tue, Feb 5, 2013 at 5:41 PM, Manish Khettry <[EMAIL PROTECTED]>
> wrote:
> >
> > > We are trying to trouble shoot a problem wherein our system just cannot
> > > seem to read messages fast enough from Kafka. We are on kafka 0.6 and
> are
> > > using the simple consumer.
> > >
> > > From looking at the logs, and we see a lot (almost constant chatty
> > > messages) about rebalancing. So for instance every minute, we see
> > messages
> > > like this:
> > >
> > >
> > > Consumer rookery-vacuum-prod_<first_ip>.internal-1360106018385
> > > rebalancing the following partitions: List(0-0, 0-1, 0-10, 0-11, 0-12,
> > > 0-13, 0-14, 0-15, 0-16, 0-17, 0-18, 0-19, 0-2, 0-3, 0-4, 0-5, 0-6,
> > > 0-7, 0-8, 0-9, 1-0, 1-1, 1-10, 1-11, 1-12, 1-13, 1-14, 1-15, 1-16,
> > > 1-17, 1-18, 1-19, 1-2, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9) for topic
> > > compact-player-logs with consumers:
> > >
> > >
> > > I also see zookeeper timeouts like so:
> > >
> > > Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed
> > > has expired, closing socket connection
> > >
> > >
> > > We increased the zookeeper session timeout from 6 seconds to 12 seconds
> > and
> > > this seems to have helped somewhat but I'm not sure  if these zookeeper
> > > timeouts at 6 seconds are symptomatic of a problem with our zookeeper
> > > cluster and/or connectivity between the consumers and zk. Any thoughts?
> > >
> > > Manish
> > >
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB