Yes - rebalance => consumers trying to coordinate through ZK.
Rebalances can happen when one or more of the following happen:
- a consumed topic partition appears or disappears - i.e., if a broker
comes or goes.
- a consumer instance in the group comes or goes
"goes" could also be triggered by session expirations in zookeeper -
typically caused by client-side GC or flaky connections to zookeeper.
On Mon, Jul 15, 2013 at 10:15 AM, Vaibhav Puranik <[EMAIL PROTECTED]> wrote:
> Hi all,
> We have a small Kafka cluster (0.7.1 - 3 nodes) in EC2. The load is about
> 200 million events per day, each being few kilobytes. We have a single node
> Yesterday suddenly our Kafka clients started throwing the following
> java.lang.RuntimeException: kafka.common.ConsumerRebalanceFailedException:
> can't rebalance after 4 retries
> at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
> None of the Kafka clients (ConsumerConenctor class) would start. They would
> fail with the exception.
> We tried restarting the clilents, restarting the zookeeper as well. But
> finally it all started working when we restarted all of our kafka brokers.
> We didn't lose any data because producers (going directly to the brokers
> through a load balancer) were working fine.
> I tried googling this issue and looks like lot of people have faced it, but
> couldn't get anything concrete.
> Given this, I have two questions:
> It will be nice if you can tell me why this can happen or point me to a
> link where I can understand it better. What does Consumer Rebalancing mean?
> Does that mean consumers are trying to coordinate amongst themselves using
> On a separate note, are there any JMX parameters I need to be monitoring to
> make sure that my kafka cluster is healthy? How can I keep watch on my
> kafka cluster?
> Vaibhav Puranik