I'm using hadoop-2.2.0 and take advantage of Hadoop WritableRpcEngine to build my distributed application, and I have 'heartbeat' interface in my application to check availability periodically, in order to detect any potential failure, I enabled "rpc_timeout" when creating the proxy as below
Everything went fine initially, I can see failures can be detected by the heartbeat, but after a period of time(2 days or so), I saw a lot of TCP connections in CLOSE_WAIT state on server side, and client was not able to connect to it again.
Because it's kind of legacy system I built 4-5 years back with Hadoop 0.2.x release, and recently we moved to 2.2.0 release. Moving to ProtocolBuffer is one option but we need to migrate our infrastructure(hadoop and so on) first and get it working(no regressions).
Is it a known issue?
Thanks On Tue, Jun 10, 2014 at 10:47 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext