Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Instances became unresponsive


Copy link to this message
-
Re: Instances became unresponsive
The errors you listed may not be serious, as long as they are transient.
When you say 2 of the brokers are not responsive, are they issuing fetch
requests to the 3rd broker (look at the request log)? During a restart of
the whole cluster, brokers that are started later may not have any leader
and thus won't take any request from the client. You will need to run the
leader balance tool.

Thanks,

Jun
On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <[EMAIL PROTECTED]>wrote:

> Somehow I am getting my instances of kafka to crash. I started kafka
> instances one by one and they started successfully. Later it some how two
> of 3 instances became completely unresponsive. The process is running, but
> connnection over jmx or taking heat dump not possible. The last one some
> what resposnive.
> I am not sure how server get to this state. Is there anything I can monitor
> to predict instances about to crash. What are ways to recover without data
> loss? What am I doing wrong to get to this state. Please advise.
> I poke around error logs on hosts that are not responsive and here are the
> errors I found. One that I have not listed LeaderNotFoundExceotion.
>
>  The most puzzling is about zookeeper as it was not redeployed or updated.
> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching metadata
> for partition [self_reactivation,0] (kafka.server.KafkaApis)
> kafka.common.ReplicaNotAvailableException
>         at
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
>         at
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
>         at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>         at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>         at
>
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>         at scala.collection.immutable.List.foreach(List.scala:76)
>         at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>
>
> in server.log
> [2013-08-26 21:00:51,942] ERROR Conditional update of path
> /brokers/topics/meetme/partitions/12/state with data {
> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
> "version":1 } and expected version 2 failed due to
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /brokers/topics/meetme/partitions/12/state
> (kafka.utils.ZkUtils$)
> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> (kafka.cluster.Partition)
> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5: Shrinking
> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
> [2013-08-26 21:00:51,993] ERROR Conditional update of path
> /brokers/topics/meetme/partitions/4/state with data { "controller_epoch":6,
> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
> version 2 failed due to
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /brokers/topics/meetme/partitions/4/state
> (kafka.utils.ZkUtils$)
> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached
> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> (kafka.cluster.Partition)
> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5: Shrinking
> ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
> [2013-08-26 21:00:52,107] ERROR Conditional update of path
> /brokers/topics/meetme/partitions/6/state with data { "controller_epoch":6,
> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected
> version 3 failed due to
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /brokers/topics/meetme/partitions/6/state
> (kafka.utils.ZkUtils$)
> [2013-08-26 21:00:52,107] INFO Partition [meetme,6] on broker 5: Cached
> zkVersion [3] not equal to that in zookeeper, skip updating ISR

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB