Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - ZooKeeper Clients waiting forever (hanging threads)


Copy link to this message
-
ZooKeeper Clients waiting forever (hanging threads)
Gunnar Wagenknecht 2011-06-21, 07:14
Hi,

I have an issue with ZK clients waiting forever. The stack for the
waiting threads looks like the following.

> java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:485)
>  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317)
>  - locked <0x00002aab19a019b0>
>    (a org.apache.zookeeper.ClientCnxn$Packet)
>  at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1241)
>  at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
>  ...
Look at the stack further I noticed many more threads hung. All with a
similar call stack (but different client calls, though).

> java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:485)
>  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317)
>  - locked <0x00002aab19a013a8>
>    (a org.apache.zookeeper.ClientCnxn$Packet)
>  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>  ...

Looking at the logs, it seems that all this started with a connection
loss during nights.

03:31:32.855 [Worker-76] WARN  ... KeeperErrorCode = ConnectionLoss ...
03:31:32.867 [Worker-65] WARN  ... KeeperErrorCode = ConnectionLoss ...

However, then I found this:

03:32:49.417 [ZooKeeper Gate Connect Thread-SendThread(zk-03:2181)]
ERROR org.apache.zookeeper.ClientCnxn - from ZooKeeper Gate Connect
Thread-SendThread(zk-03:2181)
java.lang.OutOfMemoryError: Java heap space
        at java.util.HashMap.resize(HashMap.java:462) ~[na:1.6.0_24]
        at java.util.HashMap.addEntry(HashMap.java:755) ~[na:1.6.0_24]
        at java.util.HashMap.put(HashMap.java:385) ~[na:1.6.0_24]
        at java.util.HashSet.add(HashSet.java:200) ~[na:1.6.0_24]
        at
java.util.AbstractCollection.addAll(AbstractCollection.java:305)
~[na:1.6.0_24]
        at
org.apache.zookeeper.ZooKeeper$ZKWatchManager.materialize(ZooKeeper.java:165)
~[na:na]
        at
org.apache.zookeeper.ClientCnxn$EventThread.queueEvent(ClientCnxn.java:474)
~[na:na]
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1172)
~[na:na]
I was wondering if this may have caused any race condition in the ZK client?

-Gunnar

--
Gunnar Wagenknecht
[EMAIL PROTECTED]
http://wagenknecht.org/
+
Gunnar Wagenknecht 2011-06-21, 07:28
+
Gunnar Wagenknecht 2011-06-21, 07:32
+
Gunnar Wagenknecht 2011-06-21, 09:27