Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Instances became unresponsive


+
Vadim Keylis 2013-08-27, 05:13
+
Jun Rao 2013-08-27, 15:15
+
Vadim Keylis 2013-08-27, 16:13
Copy link to this message
-
Re: Instances became unresponsive
Neha Narkhede 2013-08-27, 17:06
When you said you tried to shutdown the broker, did you try controlled
shutdown? Do you see "Shutting down" in the logs of the broker that seemed
to hang?

Thanks,
Neha
On Tue, Aug 27, 2013 at 9:12 AM, Vadim Keylis <[EMAIL PROTECTED]> wrote:

> No. They actually were stuck. Not responding to shutdown request. I had to
> kill them with kill -9 command. I try to take heap dump which hang as well.
>
> Sent from my iPhone
>
> On Aug 27, 2013, at 8:14 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > The errors you listed may not be serious, as long as they are transient.
> > When you say 2 of the brokers are not responsive, are they issuing fetch
> > requests to the 3rd broker (look at the request log)? During a restart of
> > the whole cluster, brokers that are started later may not have any leader
> > and thus won't take any request from the client. You will need to run the
> > leader balance tool.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <[EMAIL PROTECTED]
> >wrote:
> >
> >> Somehow I am getting my instances of kafka to crash. I started kafka
> >> instances one by one and they started successfully. Later it some how
> two
> >> of 3 instances became completely unresponsive. The process is running,
> but
> >> connnection over jmx or taking heat dump not possible. The last one some
> >> what resposnive.
> >> I am not sure how server get to this state. Is there anything I can
> monitor
> >> to predict instances about to crash. What are ways to recover without
> data
> >> loss? What am I doing wrong to get to this state. Please advise.
> >> I poke around error logs on hosts that are not responsive and here are
> the
> >> errors I found. One that I have not listed LeaderNotFoundExceotion.
> >>
> >> The most puzzling is about zookeeper as it was not redeployed or
> updated.
> >> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching
> metadata
> >> for partition [self_reactivation,0] (kafka.server.KafkaApis)
> >> kafka.common.ReplicaNotAvailableException
> >>        at
> >>
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
> >>        at
> >>
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
> >>        at
> >>
> >>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> >>        at
> >>
> >>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
> >>        at
> >>
> >>
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> >>        at scala.collection.immutable.List.foreach(List.scala:76)
> >>        at
> >> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> >>
> >>
> >> in server.log
> >> [2013-08-26 21:00:51,942] ERROR Conditional update of path
> >> /brokers/topics/meetme/partitions/12/state with data {
> >> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
> >> "version":1 } and expected version 2 failed due to
> >> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode =
> >> BadVersion for /brokers/topics/meetme/partitions/12/state
> >> (kafka.utils.ZkUtils$)
> >> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
> >> zkVersion [2] not equal to that in zookeeper, skip updating ISR
> >> (kafka.cluster.Partition)
> >> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5:
> Shrinking
> >> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
> >> [2013-08-26 21:00:51,993] ERROR Conditional update of path
> >> /brokers/topics/meetme/partitions/4/state with data {
> "controller_epoch":6,
> >> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
> >> version 2 failed due to
> >> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode =
> >> BadVersion for /brokers/topics/meetme/partitions/4/state
> >> (kafka.utils.ZkUtils$)
> >> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached

 
+
Vadim Keylis 2013-08-27, 18:37
+
Jun Rao 2013-08-28, 03:52
+
Vadim Keylis 2013-08-28, 06:51
+
Vadim Keylis 2013-08-28, 07:14
+
Jun Rao 2013-08-28, 14:49