|
|
-
ZooKeeper Clients waiting forever (hanging threads)Gunnar Wagenknecht 2011-06-21, 07:14
Hi,
I have an issue with ZK clients waiting forever. The stack for the waiting threads looks like the following. > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317) > - locked <0x00002aab19a019b0> > (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1241) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271) > ... Look at the stack further I noticed many more threads hung. All with a similar call stack (but different client calls, though). > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317) > - locked <0x00002aab19a013a8> > (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804) > ... Looking at the logs, it seems that all this started with a connection loss during nights. 03:31:32.855 [Worker-76] WARN ... KeeperErrorCode = ConnectionLoss ... 03:31:32.867 [Worker-65] WARN ... KeeperErrorCode = ConnectionLoss ... However, then I found this: 03:32:49.417 [ZooKeeper Gate Connect Thread-SendThread(zk-03:2181)] ERROR org.apache.zookeeper.ClientCnxn - from ZooKeeper Gate Connect Thread-SendThread(zk-03:2181) java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:462) ~[na:1.6.0_24] at java.util.HashMap.addEntry(HashMap.java:755) ~[na:1.6.0_24] at java.util.HashMap.put(HashMap.java:385) ~[na:1.6.0_24] at java.util.HashSet.add(HashSet.java:200) ~[na:1.6.0_24] at java.util.AbstractCollection.addAll(AbstractCollection.java:305) ~[na:1.6.0_24] at org.apache.zookeeper.ZooKeeper$ZKWatchManager.materialize(ZooKeeper.java:165) ~[na:na] at org.apache.zookeeper.ClientCnxn$EventThread.queueEvent(ClientCnxn.java:474) ~[na:na] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1172) ~[na:na] I was wondering if this may have caused any race condition in the ZK client? -Gunnar -- Gunnar Wagenknecht [EMAIL PROTECTED] http://wagenknecht.org/ |