Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - zk keeps disconnecting and reconnecting


Copy link to this message
-
zk keeps disconnecting and reconnecting
Jun Rao 2011-08-23, 21:58
I have a ZK server cluster running on 4 nodes (version 3.3.3) and a few ZK
clients (version 3.3.0). After the clients have been running for a while,
each of them starts to constantly disconnect and reconnect to the ZK server.
On the client, I saw lots of entries like the following:
2011/08/23 14:42:06.579 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection to
server esv4-app28.stg/172.18.98.101:12913
2011/08/23 14:42:06.579 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
established to esv4-app28.stg/172.18.98.101:12913, initiating session
2011/08/23 14:42:06.581 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Session establishment
complete on server esv4-app28.stg/172.18.98.101:12913, sessionid 0x331f77a1ed80004, negotiated timeout = 6000
2011/08/23 14:42:06.581 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (SyncConnected)
2011/08/23 14:42:06.583 WARN [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Session 0x331f77a1ed80004
for server esv4-app28.stg/172.18.98.101:12913, unexpected error, closing
socket connection and attempting reconnect
java.lang.StringIndexOutOfBoundsException: String index out of range:
-3        at java.lang.String.substring(String.java:1937)
        at java.lang.String.substring(String.java:1904)
        at
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
2011/08/23 14:42:06.683 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (Disconnected)
2011/08/23 14:42:07.510 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection to
server esv4-app29.stg/172.18.98.89:12913
2011/08/23 14:42:07.511 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection
established to esv4-app29.stg/172.18.98.89:12913, initiating session
2011/08/23 14:42:07.512 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Session establishment
complete on server esv4-app29.stg/172.18.98.89:12913, sessionid = 0x331f
77a1ed80004, negotiated timeout = 6000
2011/08/23 14:42:07.513 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (SyncConnected)
2011/08/23 14:42:07.552 WARN [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Session 0x331f77a1ed80004
for server esv4-app29.stg/172.18.98.89:12913, unexpected error, clos
ing socket connection and attempting reconnect
java.lang.StringIndexOutOfBoundsException: String index out of range:
-3        at java.lang.String.substring(String.java:1937)
        at java.lang.String.substring(String.java:1904)
        at
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
2011/08/23 14:42:07.653 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (Disconnected)

On the ZK server, I saw lots of entries like these:
2011-08-23 14:34:28,802 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
connection from /172.18.98.95:345072011-08-23 14:34:28,803 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
attempting to renew session 0x231db9b5f3f010e at /172.18.98.95:34507
2011-08-23 14:34:28,803 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
1581489222990563982011-08-23 14:34:28,803 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established session
0x231db9b5f3f010e with negotiated timeout 6000 for client /172.18.98.95:345
072011-08-23 14:34:28,805 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to
read additional data from client sessionid 0x231db9b5f3
f010e, likely client has closed socket
2011-08-23 14:34:28,805 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
client /172.18.98.95:34507 which had sessionid 0x231db9b5
f3f010e
2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
connection from /172.18.98.94:588812011-08-23 14:34:29,512 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
attempting to renew session 0x231db9b5f3f010c at /172.18.98.94:58881
2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
1581489222990563962011-08-23 14:34:29,513 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established session
0x231db9b5f3f010c with negotiated timeout 6000 for client /172.18.98.94:588
812011-08-23 14:34:29,514 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to
read additional data from client sessionid 0x231db9b5f3
f010c, likely client has closed socket
2011-08-23 14:34:29,514 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
client /172.18.98.94:58881 which had sessionid 0x231db9b5
f3f010c

Anybody knows what could be the problem? A few other things that may help
understand the problem: each ZK client is also connecting to another cluster
of ZK servers in the same JVM and all the ZK connections use the ZK
namespace (chroot).

Thanks,

Jun
+
Ted Dunning 2011-08-23, 23:07
+
Jun Rao 2011-08-23, 23:24
+
Mahadev Konar 2011-08-24, 03:42
+
Patrick Hunt 2011-08-25, 16:34
+
Jun Rao 2011-08-29, 04:34
+
Jun Rao 2011-08-29, 16:39
+
Fournier, Camille F. 2011-08-29, 17:50
+
Mahadev Konar 2011-08-29, 18:10
+
Fournier, Camille F. 2011-08-29, 19:38
+
Mahadev Konar 2011-08-29, 19:45
+
Benjamin Reed 2011-08-30, 23:45
+
Camille Fournier 2011-08-30, 23:50
+
Jun Rao 2011-08-31, 01:48
+
kishore g 2011-08-31, 17:02
+
Fournier, Camille F. 2011-09-01, 14:06