Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Instances became unresponsive


Copy link to this message
-
Re: Instances became unresponsive
No. They actually were stuck. Not responding to shutdown request. I had to kill them with kill -9 command. I try to take heap dump which hang as well.

Sent from my iPhone

On Aug 27, 2013, at 8:14 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> The errors you listed may not be serious, as long as they are transient.
> When you say 2 of the brokers are not responsive, are they issuing fetch
> requests to the 3rd broker (look at the request log)? During a restart of
> the whole cluster, brokers that are started later may not have any leader
> and thus won't take any request from the client. You will need to run the
> leader balance tool.
>
> Thanks,
>
> Jun
>
>
> On Mon, Aug 26, 2013 at 10:12 PM, Vadim Keylis <[EMAIL PROTECTED]>wrote:
>
>> Somehow I am getting my instances of kafka to crash. I started kafka
>> instances one by one and they started successfully. Later it some how two
>> of 3 instances became completely unresponsive. The process is running, but
>> connnection over jmx or taking heat dump not possible. The last one some
>> what resposnive.
>> I am not sure how server get to this state. Is there anything I can monitor
>> to predict instances about to crash. What are ways to recover without data
>> loss? What am I doing wrong to get to this state. Please advise.
>> I poke around error logs on hosts that are not responsive and here are the
>> errors I found. One that I have not listed LeaderNotFoundExceotion.
>>
>> The most puzzling is about zookeeper as it was not redeployed or updated.
>> [2013-08-26 12:14:35,357] ERROR [KafkaApi-5] Error while fetching metadata
>> for partition [self_reactivation,0] (kafka.server.KafkaApis)
>> kafka.common.ReplicaNotAvailableException
>>        at
>> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:471)
>>        at
>> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:456)
>>        at
>>
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>>        at
>>
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>>        at
>>
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>>        at scala.collection.immutable.List.foreach(List.scala:76)
>>        at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>>
>>
>> in server.log
>> [2013-08-26 21:00:51,942] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/12/state with data {
>> "controller_epoch":6, "isr":[ 5 ], "leader":5, "leader_epoch":1,
>> "version":1 } and expected version 2 failed due to
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
>> BadVersion for /brokers/topics/meetme/partitions/12/state
>> (kafka.utils.ZkUtils$)
>> [2013-08-26 21:00:51,943] INFO Partition [meetme,12] on broker 5: Cached
>> zkVersion [2] not equal to that in zookeeper, skip updating ISR
>> (kafka.cluster.Partition)
>> [2013-08-26 21:00:51,990] INFO Partition [meetme,4] on broker 5: Shrinking
>> ISR for partition [meetme,4] from 5,4 to 5 (kafka.cluster.Partition)
>> [2013-08-26 21:00:51,993] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/4/state with data { "controller_epoch":6,
>> "isr":[ 5 ], "leader":5, "leader_epoch":1, "version":1 } and expected
>> version 2 failed due to
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
>> BadVersion for /brokers/topics/meetme/partitions/4/state
>> (kafka.utils.ZkUtils$)
>> [2013-08-26 21:00:51,993] INFO Partition [meetme,4] on broker 5: Cached
>> zkVersion [2] not equal to that in zookeeper, skip updating ISR
>> (kafka.cluster.Partition)
>> [2013-08-26 21:00:52,103] INFO Partition [meetme,6] on broker 5: Shrinking
>> ISR for partition [meetme,6] from 5,4 to 5 (kafka.cluster.Partition)
>> [2013-08-26 21:00:52,107] ERROR Conditional update of path
>> /brokers/topics/meetme/partitions/6/state with data { "controller_epoch":6,
>> "isr":[ 5 ], "leader":5, "leader_epoch":2, "version":1 } and expected

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB