Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> zkclient dies after UnknownHostException in zk reconnect


Copy link to this message
-
Re: zkclient dies after UnknownHostException in zk reconnect
Submitted a pull request: https://github.com/sgroschupf/zkclient/pull/24.
On Tue, Sep 24, 2013 at 1:46 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> Ya, it is not very active, but you can submit patches to master on
> https://github.com/sgroschupf/zkclient
>
> Thanks,
> Neha
>
>
> On Tue, Sep 24, 2013 at 9:58 AM, Anatoly Fayngelerin <[EMAIL PROTECTED]
> >wrote:
>
> > That does sound like a saner solution. Which github repo do you submit
> > patches to? It looks like the repo I posted on originally(
> > https://github.com/sgroschupf/zkclient/issues/23) might be a little
> stale.
> >
> >
> > On Tue, Sep 24, 2013 at 11:34 AM, Neha Narkhede <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Thanks for explaining the bug. This is a serious issue that we should
> fix
> > > at the zkclient level. We have submitted patches to them before and
> they
> > > were pretty helpful in releasing a new version with the patch. I think
> > that
> > > will lead to a cleaner solution than trying to get around it in Kafka
> > code
> > > since zkclient usage is pretty wide spread across the server and
> consumer
> > > code today.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Tue, Sep 24, 2013 at 8:28 AM, Anatoly Fayngelerin <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Joel - that is exactly right. ZkClient has no way to notify consumers
> > of
> > > > this situation. The session end event gets fired, however, the
> session
> > > > begin event never occurs.
> > > >
> > > > Neha - The issue manifested itself when producers were attempting to
> > > > discover topics/brokers. The kafka brokers had lost their ZK sessions
> > > > during a network outage. The outage was long enough for ZooKeeper to
> > > expire
> > > > the sessions corresponding to the ephemeral nodes in /broker/. The
> > > zkclient
> > > > bug prevented the broker from ever re-establishing the ZK session.
> > > > Subsequently, no zookeeper based producer was able to discover
> > > > topic->broker mappings. The resulting exceptions looked like:
> > > >
> > > > Caused by: kafka.common.NoBrokersForPartitionException: Partition =
> > null
> > > > at
> > > >
> > > >
> > >
> >
> kafka.producer.Producer.kafka$producer$Producer$getPartitionListForTopic(Producer.scala:167)
> > > > at kafka.producer.Producer$anonfun$3.apply(Producer.scala:116)
> > > > at kafka.producer.Producer$anonfun$3.apply(Producer.scala:105)
> > > > at
> > > >
> > > >
> > >
> >
> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:233)
> > > > at
> > > >
> > > >
> > >
> >
> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:233)
> > > > at
> > > >
> > > >
> > >
> >
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
> > > > at
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:33)
> > > > at
> > scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> > > > at scala.collection.mutable.WrappedArray.map(WrappedArray.scala:33)
> > > > at kafka.producer.Producer.zkSend(Producer.scala:105)
> > > > at kafka.producer.Producer.send(Producer.scala:99)
> > > > at
> > > >
> > > >
> > >
> >
> com.yieldmo.common.protobuf.ProtoKafkaWriter$class.write(ProtoKafka.scala:20)
> > > > at com.yieldmo.common.protobuf.ProtoWriter.write(ProtoKafka.scala:40)
> > > > at
> > > >
> > > >
> > >
> >
> com.yieldmo.storm.bolt.KafkaProtoWriterBolt.execute(KafkaProtoWriterBolt.scala:48)
> > > >
> > > > As far as I can see, the only way to deal with this without patching
> > > > zkclient is to periodically check the status of the zk connection and
> > try
> > > > to detect this kind of situation. I would love to hear better ideas
> for
> > > how
> > > > to handle this.
> > > >
> > > >
> > > > On Tue, Sep 24, 2013 at 3:31 AM, Joel Koshy <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > > node loss. Did the Kafka consumer not respond to rebalance events
> > or
> > > > did
> > > > > > the server not respond to state change events ? Also, ephemeral

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB