What is the full stack trace? if you see "can't rebalance after 4 retries" then likely the problem is the broker is down or not available
/******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Fri, Nov 29, 2013 at 11:31 AM, Yu, Libo <[EMAIL PROTECTED]> wrote:
Is the failure on the last rebalance? If so, some partitions will not have any consumers. A common reason for rebalance failure is that there is conflict in owning partitions among different consumers in the same group. Increasing the # retries and the amount of backoff time btw retires should help. Our default setting should be good enough if there are not too many topics being subscribed and the ZK latency is normal.
Jun On Mon, Dec 2, 2013 at 6:57 AM, Yu, Libo <[EMAIL PROTECTED]> wrote:
I am getting consumer rebalance failed exception if i restart my consumer within 1-3 seconds.
Exception trace is:
Caused by: kafka.common.ConsumerRebalanceFailedException: indexConsumerGroup1_IMPETUS-I0027C-1388416992091-ac0d82d7 can't rebalance after 4 retries at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown Source) at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(Unknown Source) at kafka.consumer.ZookeeperConsumerConnector.consume(Unknown Source) at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(Unknown Source) at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(Unknown Source) Is this exception depends on any of below properties: zookeeper.session.timeout.ms 6000 zookeeper.connection.timeout.ms 6000
If i kill the consumer and start that again after 5-6 sec then it started working properly without throwing any exception.
If i start consumer immediately after killing that then ConsumerRebalanceFailedException occurs.
As default zookeeper.session.timeout.ms is 6000 and i look into the details this value is negotiable. We try to set this value to less than 4000 to expire the session early but it is negotiated by zookeeper and set to 4000 ms.
We have a backend script running which check in each second that if consumer service is not running then start it. So using this we are starting consumer service within second without any wait. *"connector.shutdown()" is good option for this but that will not work if consumer is killed abnormally using kill -9*.
Other option i am seeing to put "Thread.sleep(sessionTimeoutMilliseconds)" in consumer service before start but that is also not good option.
*When ConsumerRebalanceFailedException occurs then it stop consumes the data. But expected behaviour should be like this : If C**onsumerRe* *balanceFai**ledExcepti**on occurs due to zookeeper session then it should wait for that timeout interval. If previous session is timeout, it should reconnect and start consuming the data.*
Any other way to handle it?
Also i want to know what is suggested value for zookeeper.session.timeout.ms in production ? On Mon, Dec 30, 2013 at 11:49 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote: *Thanks & Regards* *Hanish Bansal*
One alternative method is to check the zookeeper consumer registration path, if the node is gone then try to restart the consumer after the sesstion timeout.
Guozhang On Mon, Dec 30, 2013 at 7:56 PM, Hanish Bansal < [EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext