Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> zk keeps disconnecting and reconnecting


Copy link to this message
-
zk keeps disconnecting and reconnecting
I have a ZK server cluster running on 4 nodes (version 3.3.3) and a few ZK
clients (version 3.3.0). After the clients have been running for a while,
each of them starts to constantly disconnect and reconnect to the ZK server.
On the client, I saw lots of entries like the following:
2011/08/23 14:42:06.579 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection to
server esv4-app28.stg/172.18.98.101:12913
2011/08/23 14:42:06.579 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
established to esv4-app28.stg/172.18.98.101:12913, initiating session
2011/08/23 14:42:06.581 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Session establishment
complete on server esv4-app28.stg/172.18.98.101:12913, sessionid 0x331f77a1ed80004, negotiated timeout = 6000
2011/08/23 14:42:06.581 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (SyncConnected)
2011/08/23 14:42:06.583 WARN [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Session 0x331f77a1ed80004
for server esv4-app28.stg/172.18.98.101:12913, unexpected error, closing
socket connection and attempting reconnect
java.lang.StringIndexOutOfBoundsException: String index out of range:
-3        at java.lang.String.substring(String.java:1937)
        at java.lang.String.substring(String.java:1904)
        at
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
2011/08/23 14:42:06.683 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (Disconnected)
2011/08/23 14:42:07.510 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection to
server esv4-app29.stg/172.18.98.89:12913
2011/08/23 14:42:07.511 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection
established to esv4-app29.stg/172.18.98.89:12913, initiating session
2011/08/23 14:42:07.512 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Session establishment
complete on server esv4-app29.stg/172.18.98.89:12913, sessionid = 0x331f
77a1ed80004, negotiated timeout = 6000
2011/08/23 14:42:07.513 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (SyncConnected)
2011/08/23 14:42:07.552 WARN [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Session 0x331f77a1ed80004
for server esv4-app29.stg/172.18.98.89:12913, unexpected error, clos
ing socket connection and attempting reconnect
java.lang.StringIndexOutOfBoundsException: String index out of range:
-3        at java.lang.String.substring(String.java:1937)
        at java.lang.String.substring(String.java:1904)
        at
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
2011/08/23 14:42:07.653 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (Disconnected)

On the ZK server, I saw lots of entries like these:
2011-08-23 14:34:28,802 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
connection from /172.18.98.95:345072011-08-23 14:34:28,803 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
attempting to renew session 0x231db9b5f3f010e at /172.18.98.95:34507
2011-08-23 14:34:28,803 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
1581489222990563982011-08-23 14:34:28,803 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established session
0x231db9b5f3f010e with negotiated timeout 6000 for client /172.18.98.95:345
072011-08-23 14:34:28,805 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to
read additional data from client sessionid 0x231db9b5f3
f010e, likely client has closed socket
2011-08-23 14:34:28,805 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
client /172.18.98.95:34507 which had sessionid 0x231db9b5
f3f010e
2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
connection from /172.18.98.94:588812011-08-23 14:34:29,512 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
attempting to renew session 0x231db9b5f3f010c at /172.18.98.94:58881
2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
1581489222990563962011-08-23 14:34:29,513 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established session
0x231db9b5f3f010c with negotiated timeout 6000 for client /172.18.98.94:588
812011-08-23 14:34:29,514 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to
read additional data from client sessionid 0x231db9b5f3
f010c, likely client has closed socket
2011-08-23 14:34:29,514 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
client /172.18.98.94:58881 which had sessionid 0x231db9b5
f3f010c

Anybody knows what could be the problem? A few other things that may help
understand the problem: each ZK client is also connecting to another cluster
of ZK servers in the same JVM and all the ZK connections use the ZK
namespace (chroot).

Thanks,

Jun
+
Ted Dunning 2011-08-23, 23:07
+
Jun Rao 2011-08-23, 23:24
+
Mahadev Konar 2011-08-24, 03:42
+
Patrick Hunt 2011-08-25, 16:34
+
Jun Rao 2011-08-29, 04:34
+
Jun Rao 2011-08-29, 16:39
+
Fournier, Camille F. 2011-08-29, 17:50
+
Mahadev Konar 2011-08-29, 18:10
+
Fournier, Camille F. 2011-08-29, 19:38
+
Mahadev Konar 2011-08-29, 19:45
+
Benjamin Reed 2011-08-30, 23:45
+
Camille Fournier 2011-08-30, 23:50
+
Jun Rao 2011-08-31, 01:48
+
kishore g 2011-08-31, 17:02
+
Fournier, Camille F. 2011-09-01, 14:06
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB