Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Instances became unresponsive


Copy link to this message
-
Instances became unresponsive
Vadim Keylis 2013-08-27, 05:13
Somehow I am getting my instances of kafka to crash. I started kafka
instances one by one and they started successfully. Later it some how two
of 3 instances became completely unresponsive. The process is running, but
connnection over jmx or taking heat dump not possible. The last one some
what resposnive.
I am not sure how server get to this state. Is there anything I can monitor
to predict instances about to crash. What are ways to recover without data
loss? What am I doing wrong to get to this state. Please advise.
I poke around error logs on hosts that are not responsive and here are the
errors I found. One that I have not listed LeaderNotFoundExceotion.

 The most puzzling is about zookeeper as it was not redeployed or updated.
[2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching metadata
for partition [self_reactivation,0] (kafka.server.KafkaApis)
kafka.common.ReplicaNotAvailableException
        at
kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
        at
kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
        at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
        at scala.collection.immutable.List.foreach(List.scala:76)
        at
scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
in server.log
[2013-08-26 21:00:51,942] ERROR Conditional update of path
/brokers/topics/meetme/partitions/12/state with data {
"controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
"version":1 } and expected version 2 failed due to
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
BadVersion for /brokers/topics/meetme/partitions/12/state
(kafka.utils.ZkUtils$)
[2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5: Shrinking
ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
[2013-08-26 21:00:51,993] ERROR Conditional update of path
/brokers/topics/meetme/partitions/4/state with data { "controller_epoch":6,
"isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
version 2 failed due to
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
BadVersion for /brokers/topics/meetme/partitions/4/state
(kafka.utils.ZkUtils$)
[2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached
zkVersion [2] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5: Shrinking
ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
[2013-08-26 21:00:52,107] ERROR Conditional update of path
/brokers/topics/meetme/partitions/6/state with data { "controller_epoch":6,
"isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected
version 3 failed due to
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
BadVersion for /brokers/topics/meetme/partitions/6/state
(kafka.utils.ZkUtils$)
[2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5: Cached
zkVersion [3] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)