The tickTime is set to 3000 and initLimit is set to 5, so readInt() should have gotten a socket timeout exception after 15 seconds. Instead, it got an eof exception after 14 minutes. I didn't get a chance to do a thread dump when this happened, but has anybody seen something similiar?
Re: quorum connection manager shutdown takes long time
Plainly removing sock.setSoTimeout(0) from http://s.apache.org/TfI has the unintended consequence of shutting down both the RecvWorker and SendWorker threads for all cases. Seems like current code is designed to keep the socket alive (and threads to keep running) so as to reuse this channel to communicate again with the the peer node which still alive but needs to redo leader election.
I could not reproduce any issue if threads shutdown after the timeout since new threads are created for next iteration of leader election. I rather would like to reuse the threads and the channel hence I propose the following approach.
This means that users can play with keep alive timeouts for TCP sockets to quicken TCP socket failures propagating to user-space and zookeeper also resets the socket if it detects other side is not responding when it knows it needs a response within some bounded time.
Ideally I wish there is some userspace pings of every socket channel between zookeeper nodes to detect dead channels quickly. Seems like one exists for sockets that do Follow/Lead after leader election is done but not for this?. Such a feature could be added with care towards making it backward compatible.
I posted the above text to Jira. Also please point out any wrong assumptions I have made and provide comments and suggestions.
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext