Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Instances became unresponsive


Copy link to this message
-
Re: Instances became unresponsive
Vadim Keylis 2013-08-27, 16:13
No. They actually were stuck. Not responding to shutdown request. I had to kill them with kill -9 command. I try to take heap dump which hang as well.

Sent from my iPhone

On Aug 27, 2013, at 8:14 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> The errors you listed may not be serious, as long as they are transient.
> When you say 2 of the brokers are not responsive, are they issuing fetch
> requests to the 3rd broker (look at the request log)? During a restart of
> the whole cluster, brokers that are started later may not have any leader
> and thus won't take any request from the client. You will need to run the
> leader balance tool.
>
> Thanks,
>
> Jun
>
>
> On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <[EMAIL PROTECTED]>wrote:
>
>> Somehow I am getting my instances of kafka to crash. I started kafka
>> instances one by one and they started successfully. Later it some how two
>> of 3 instances became completely unresponsive. The process is running, but
>> connnection over jmx or taking heat dump not possible. The last one some
>> what resposnive.
>> I am not sure how server get to this state. Is there anything I can monitor
>> to predict instances about to crash. What are ways to recover without data
>> loss? What am I doing wrong to get to this state. Please advise.
>> I poke around error logs on hosts that are not responsive and here are the
>> errors I found. One that I have not listed LeaderNotFoundExceotion.
>>
>> The most puzzling is about zookeeper as it was not redeployed or updated.
>> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching metadata
>> for partition [self_reactivation,0] (kafka.server.KafkaApis)
>> kafka.common.ReplicaNotAvailableException
>>        at
>> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
>>        at
>> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
>>        at
>>
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>>        at
>>
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>>        at
>>
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>>        at scala.collection.immutable.List.foreach(List.scala:76)
>>        at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>>
>>
>> in server.log
>> [2013-08-26 21:00:51,942] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/12/state with data {
>> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
>> "version":1 } and expected version 2 failed due to
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
>> BadVersion for /brokers/topics/meetme/partitions/12/state
>> (kafka.utils.ZkUtils$)
>> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
>> zkVersion [2] not equal to that in zookeeper, skip updating ISR
>> (kafka.cluster.Partition)
>> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5: Shrinking
>> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
>> [2013-08-26 21:00:51,993] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/4/state with data { "controller_epoch":6,
>> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
>> version 2 failed due to
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
>> BadVersion for /brokers/topics/meetme/partitions/4/state
>> (kafka.utils.ZkUtils$)
>> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached
>> zkVersion [2] not equal to that in zookeeper, skip updating ISR
>> (kafka.cluster.Partition)
>> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5: Shrinking
>> ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
>> [2013-08-26 21:00:52,107] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/6/state with data { "controller_epoch":6,
>> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected