Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> ZooKeeper Clients waiting forever (hanging threads)


Copy link to this message
-
ZooKeeper Clients waiting forever (hanging threads)
Hi,

I have an issue with ZK clients waiting forever. The stack for the
waiting threads looks like the following.

> java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:485)
>  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317)
>  - locked <0x00002aab19a019b0>
>    (a org.apache.zookeeper.ClientCnxn$Packet)
>  at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1241)
>  at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
>  ...
Look at the stack further I noticed many more threads hung. All with a
similar call stack (but different client calls, though).

> java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:485)
>  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317)
>  - locked <0x00002aab19a013a8>
>    (a org.apache.zookeeper.ClientCnxn$Packet)
>  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>  ...

Looking at the logs, it seems that all this started with a connection
loss during nights.

03:31:32.855 [Worker-76] WARN  ... KeeperErrorCode = ConnectionLoss ...
03:31:32.867 [Worker-65] WARN  ... KeeperErrorCode = ConnectionLoss ...

However, then I found this:

03:32:49.417 [ZooKeeper Gate Connect Thread-SendThread(zk-03:2181)]
ERROR org.apache.zookeeper.ClientCnxn - from ZooKeeper Gate Connect
Thread-SendThread(zk-03:2181)
java.lang.OutOfMemoryError: Java heap space
        at java.util.HashMap.resize(HashMap.java:462) ~[na:1.6.0_24]
        at java.util.HashMap.addEntry(HashMap.java:755) ~[na:1.6.0_24]
        at java.util.HashMap.put(HashMap.java:385) ~[na:1.6.0_24]
        at java.util.HashSet.add(HashSet.java:200) ~[na:1.6.0_24]
        at
java.util.AbstractCollection.addAll(AbstractCollection.java:305)
~[na:1.6.0_24]
        at
org.apache.zookeeper.ZooKeeper$ZKWatchManager.materialize(ZooKeeper.java:165)
~[na:na]
        at
org.apache.zookeeper.ClientCnxn$EventThread.queueEvent(ClientCnxn.java:474)
~[na:na]
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1172)
~[na:na]
I was wondering if this may have caused any race condition in the ZK client?

-Gunnar

--
Gunnar Wagenknecht
[EMAIL PROTECTED]
http://wagenknecht.org/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB