Kafka, mail # dev - Re: zkclient dies after UnknownHostException in zk reconnect - 2013-09-24, 15:28
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: zkclient dies after UnknownHostException in zk reconnect
Joel - that is exactly right. ZkClient has no way to notify consumers of
this situation. The session end event gets fired, however, the session
begin event never occurs.

Neha - The issue manifested itself when producers were attempting to
discover topics/brokers. The kafka brokers had lost their ZK sessions
during a network outage. The outage was long enough for ZooKeeper to expire
the sessions corresponding to the ephemeral nodes in /broker/. The zkclient
bug prevented the broker from ever re-establishing the ZK session.
Subsequently, no zookeeper based producer was able to discover
topic->broker mappings. The resulting exceptions looked like:

Caused by: kafka.common.NoBrokersForPartitionException: Partition = null
at
kafka.producer.Producer.kafka$producer$Producer$getPartitionListForTopic(Producer.scala:167)
at kafka.producer.Producer$anonfun$3.apply(Producer.scala:116)
at kafka.producer.Producer$anonfun$3.apply(Producer.scala:105)
at
scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:233)
at
scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:233)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:33)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.mutable.WrappedArray.map(WrappedArray.scala:33)
at kafka.producer.Producer.zkSend(Producer.scala:105)
at kafka.producer.Producer.send(Producer.scala:99)
at
com.yieldmo.common.protobuf.ProtoKafkaWriter$class.write(ProtoKafka.scala:20)
at com.yieldmo.common.protobuf.ProtoWriter.write(ProtoKafka.scala:40)
at
com.yieldmo.storm.bolt.KafkaProtoWriterBolt.execute(KafkaProtoWriterBolt.scala:48)

As far as I can see, the only way to deal with this without patching
zkclient is to periodically check the status of the zk connection and try
to detect this kind of situation. I would love to hear better ideas for how
to handle this.
On Tue, Sep 24, 2013 at 3:31 AM, Joel Koshy <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB