Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> ConsumerRebalanceFailedException when broker unavailable


Copy link to this message
-
Re: ConsumerRebalanceFailedException
Yes, I think these are two separate issues.

F.

On 7/16/13 11:32 AM, "Joel Koshy" <[EMAIL PROTECTED]> wrote:

>From a user's perspective, ConsumerRebalanceException is a bit cryptic
>-I think the other thread was to provide a more informative message
>and also be able to recover when a broker does come up (fixed in
>KAFKA-969).
>
>Thanks,
>
>Joel
>
>On Tue, Jul 16, 2013 at 11:04 AM, Vaibhav Puranik <[EMAIL PROTECTED]>
>wrote:
>> Thank you Joel.
>>
>> In a different but related thread, somebody is asking to rename the
>> exception as NoBrokerAvailableExcption. But given the description above,
>> the exception seems to be named appropriately.
>>
>> Regards,
>> Vaibhav
>>
>>
>> On Tue, Jul 16, 2013 at 12:05 AM, Joel Koshy <[EMAIL PROTECTED]>
>>wrote:
>>
>>> Yes - rebalance => consumers trying to coordinate through ZK.
>>> Rebalances can happen when one or more of the following happen:
>>> - a consumed topic partition appears or disappears - i.e., if a broker
>>> comes or goes.
>>> - a consumer instance in the group comes or goes
>>> "goes" could also be triggered by session expirations in zookeeper -
>>> typically caused by client-side GC or flaky connections to zookeeper.
>>>
>>> On Mon, Jul 15, 2013 at 10:15 AM, Vaibhav Puranik <[EMAIL PROTECTED]>
>>> wrote:
>>> > Hi all,
>>> >
>>> > We have a small Kafka cluster (0.7.1 - 3 nodes) in EC2. The load is
>>>about
>>> > 200 million events per day, each being few kilobytes. We have a
>>>single
>>> node
>>> > zookeeper.
>>> >
>>> > Yesterday suddenly our Kafka clients started throwing the following
>>> > exception:
>>> > java.lang.RuntimeException:
>>> kafka.common.ConsumerRebalanceFailedException:
>>> >
>>>CONSUMER_GROUP_NAME_ip-00-00-00-00.ec2.internal-1373821190828-5f78e9af
>>> > can't rebalance after 4 retries
>>> >     at
>>> >
>>>
>>>com.gumgum.kafka.consumer.KafkaTemplate.executeWithBatch(KafkaTemplate.j
>>>ava:59)
>>> >     at
>>> >
>>>
>>>com.gumgum.storm.fileupload.GenericKafkaSpout.nextTuple(GenericKafkaSpou
>>>t.java:73)
>>> >     at
>>> >
>>>
>>>backtype.storm.daemon.executor$fn__3968$fn__4009$fn__4010.invoke(executo
>>>r.clj:433)
>>> >     at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
>>> >
>>> > None of the Kafka clients (ConsumerConenctor class) would start. They
>>> would
>>> > fail with the exception.
>>> >
>>> > We tried restarting the clilents, restarting the zookeeper as well.
>>>But
>>> > finally it all started working when we restarted all of our kafka
>>> brokers.
>>> > We didn't lose any data because producers (going directly to the
>>>brokers
>>> > through a load balancer) were working fine.
>>> >
>>> > I tried googling this issue and looks like lot of people have faced
>>>it,
>>> but
>>> > couldn't get anything concrete.
>>> >
>>> > Given this, I have two questions:
>>> >
>>> > It will be nice if you can tell me why this can happen or point me
>>>to a
>>> > link where I can understand it better. What does Consumer Rebalancing
>>> mean?
>>> > Does that mean consumers are trying to coordinate amongst themselves
>>> using
>>> > Zookeeper?
>>> >
>>> > On a separate note, are there any JMX parameters I need to be
>>>monitoring
>>> to
>>> > make sure that my kafka cluster is healthy? How can I keep watch on
>>>my
>>> > kafka cluster?
>>> >
>>> > Regards,
>>> > Vaibhav Puranik
>>> > GumGum
>>>