Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> zkclient dies after UnknownHostException in zk reconnect


+
Anatoly Fayngelerin 2013-09-23, 14:02
+
Neha Narkhede 2013-09-24, 03:44
+
Joel Koshy 2013-09-24, 07:32
+
Anatoly Fayngelerin 2013-09-24, 15:28
+
Neha Narkhede 2013-09-24, 15:34
Copy link to this message
-
Re: zkclient dies after UnknownHostException in zk reconnect
That does sound like a saner solution. Which github repo do you submit
patches to? It looks like the repo I posted on originally(
https://github.com/sgroschupf/zkclient/issues/23) might be a little stale.
On Tue, Sep 24, 2013 at 11:34 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> Thanks for explaining the bug. This is a serious issue that we should fix
> at the zkclient level. We have submitted patches to them before and they
> were pretty helpful in releasing a new version with the patch. I think that
> will lead to a cleaner solution than trying to get around it in Kafka code
> since zkclient usage is pretty wide spread across the server and consumer
> code today.
>
> Thanks,
> Neha
>
>
> On Tue, Sep 24, 2013 at 8:28 AM, Anatoly Fayngelerin <[EMAIL PROTECTED]
> >wrote:
>
> > Joel - that is exactly right. ZkClient has no way to notify consumers of
> > this situation. The session end event gets fired, however, the session
> > begin event never occurs.
> >
> > Neha - The issue manifested itself when producers were attempting to
> > discover topics/brokers. The kafka brokers had lost their ZK sessions
> > during a network outage. The outage was long enough for ZooKeeper to
> expire
> > the sessions corresponding to the ephemeral nodes in /broker/. The
> zkclient
> > bug prevented the broker from ever re-establishing the ZK session.
> > Subsequently, no zookeeper based producer was able to discover
> > topic->broker mappings. The resulting exceptions looked like:
> >
> > Caused by: kafka.common.NoBrokersForPartitionException: Partition = null
> > at
> >
> >
> kafka.producer.Producer.kafka$producer$Producer$getPartitionListForTopic(Producer.scala:167)
> > at kafka.producer.Producer$anonfun$3.apply(Producer.scala:116)
> > at kafka.producer.Producer$anonfun$3.apply(Producer.scala:105)
> > at
> >
> >
> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:233)
> > at
> >
> >
> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:233)
> > at
> >
> >
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
> > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:33)
> > at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> > at scala.collection.mutable.WrappedArray.map(WrappedArray.scala:33)
> > at kafka.producer.Producer.zkSend(Producer.scala:105)
> > at kafka.producer.Producer.send(Producer.scala:99)
> > at
> >
> >
> com.yieldmo.common.protobuf.ProtoKafkaWriter$class.write(ProtoKafka.scala:20)
> > at com.yieldmo.common.protobuf.ProtoWriter.write(ProtoKafka.scala:40)
> > at
> >
> >
> com.yieldmo.storm.bolt.KafkaProtoWriterBolt.execute(KafkaProtoWriterBolt.scala:48)
> >
> > As far as I can see, the only way to deal with this without patching
> > zkclient is to periodically check the status of the zk connection and try
> > to detect this kind of situation. I would love to hear better ideas for
> how
> > to handle this.
> >
> >
> > On Tue, Sep 24, 2013 at 3:31 AM, Joel Koshy <[EMAIL PROTECTED]> wrote:
> >
> > > > node loss. Did the Kafka consumer not respond to rebalance events or
> > did
> > > > the server not respond to state change events ? Also, ephemeral nodes
> > are
> > > > lost only when sessions are expired on the zookeeper server or if
> > clients
> > > > close the session actively, how does losing connection lead to
> > ephemeral
> > > > node loss?
> > >
> > > My understanding of Anatoly's observation is that on session
> > > expiration, zkclient will reconnect
> > > (
> > >
> >
> https://github.com/sgroschupf/zkclient/blob/master/src/main/java/org/I0Itec/zkclient/ZkClient.java#L458
> > > )
> > > but if the connect causes an IOException, that would effectively mean
> > > that the session will not get re-established. Anatoly, can you
> > > confirm?
> > >
> > > > On Mon, Sep 23, 2013 at 7:02 AM, Anatoly Fayngelerin <
> > [EMAIL PROTECTED]
> > > >wrote:
> > > >
> > > >> Hi Everyone,
> > > >>
> > > >> I've run into the following issue with the Kafka server. The

 
+
Neha Narkhede 2013-09-24, 17:47
+
Anatoly Fayngelerin 2013-09-24, 21:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB