I am getting consumer rebalance failed exception if i restart my consumer within 1-3 seconds.
Exception trace is:
Caused by: kafka.common.ConsumerRebalanceFailedException: indexConsumerGroup1_IMPETUS-I0027C-1388416992091-ac0d82d7 can't rebalance after 4 retries at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown Source) at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(Unknown Source) at kafka.consumer.ZookeeperConsumerConnector.consume(Unknown Source) at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(Unknown Source) at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(Unknown Source) Is this exception depends on any of below properties: zookeeper.session.timeout.ms 6000 zookeeper.connection.timeout.ms 6000
If i kill the consumer and start that again after 5-6 sec then it started working properly without throwing any exception.
If i start consumer immediately after killing that then ConsumerRebalanceFailedException occurs.
As default zookeeper.session.timeout.ms is 6000 and i look into the details this value is negotiable. We try to set this value to less than 4000 to expire the session early but it is negotiated by zookeeper and set to 4000 ms.
We have a backend script running which check in each second that if consumer service is not running then start it. So using this we are starting consumer service within second without any wait. *"connector.shutdown()" is good option for this but that will not work if consumer is killed abnormally using kill -9*.
Other option i am seeing to put "Thread.sleep(sessionTimeoutMilliseconds)" in consumer service before start but that is also not good option.
*When ConsumerRebalanceFailedException occurs then it stop consumes the data. But expected behaviour should be like this : If C**onsumerRe* *balanceFai**ledExcepti**on occurs due to zookeeper session then it should wait for that timeout interval. If previous session is timeout, it should reconnect and start consuming the data.*
Any other way to handle it?
Also i want to know what is suggested value for zookeeper.session.timeout.ms in production ? On Mon, Dec 30, 2013 at 11:49 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote: *Thanks & Regards* *Hanish Bansal*
Option looking fine to me is:check the zookeeper consumer registration path, if the node is gone then try to restart the consumer after the session timeout.
The thing is we ll have to implement this as currently it is not taken care by high level consumer. On Tue, Dec 31, 2013 at 10:20 PM, Jun Rao <[EMAIL PROTECTED]> wrote: *Thanks & Regards* *Hanish Bansal*
I tried to reproduce this exception. In case one, when no broker was running, I launched all consumers and got this exception. In case two, while the consumers and brokers were running, I shutdown all brokers one by one and did not see this exception. I wonder why in case two this exception did not occur. Thanks. Regards,
NEW: Monitor These Apps!
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext