Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> ephemeral node not deleted after client session closed


Copy link to this message
-
ephemeral node not deleted after client session closed
Hi,

We are seeing this issue again with our Kafka zookeeper setup. Let me
explain with the following log entries -

Here is the node that didn't get deleted -

get /kafka-tracking/brokers/ids/528
172.18.98.101-1320781574264:172.18.98.101:10251
cZxid = 0x605f90a78ctime = Tue Nov 08 11:46:26 PST 2011
mZxid = 0x605f90a78mtime = Tue Nov 08 11:46:26 PST 2011
pZxid = 0x605f90a78cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x23220b93b4a33bc
dataLength = 47
numChildren = 0

I ran Kishore's awesome version of the LogFormatter tool on the
relevant portion of the zookeeper transaction logs -

time:11/8/11 11:46:14 AM PST session:0x23220b93b4a33bc cxid:0x0
zxid:0x605f90a59 type:createSession timeOut:6000
time:11/8/11 11:46:26 AM PST session:0x23220b93b4a33bc cxid:0x0
zxid:0x605f90a72 type:closeSession
time:11/8/11 11:46:26 AM PST session:0x23220b93b4a33bc cxid:0x1
zxid:0x605f90a78 type:create acl:[31,s{'world,'anyone}

We want to understand why there was a 12 second delay in creation of
znode after createSession

Checking the logs, we found that the there is a 30+ sec delay between
the commitProcessor and FinalRequestProcessor.

2011-11-08 11:46:14,661 - DEBUG
[QuorumPeer:/0:0:0:0:0:0:0:0:12913:CommitProcessor@159] - Committing
request:: sessionid:0x23220b93b4a33bc type:createSession
 cxid:0x0 zxid:0x605f90a59 txntype:-10 reqpath:n/a
2011-11-08 11:46:26,045 - DEBUG
[QuorumPeer:/0:0:0:0:0:0:0:0:12913:CommitProcessor@159] - Committing
request:: sessionid:0x23220b93b4a33bc type:closeSession
cxid:0x0 zxid:0x605f90a72 txntype:-11 reqpath:n/a
2011-11-08 11:46:26,047 - DEBUG
[QuorumPeer:/0:0:0:0:0:0:0:0:12913:CommitProcessor@159] - Committing
request:: sessionid:0x23220b93b4a33bc type:create cxid:0
x1 zxid:0x605f90a78 txntype:1 reqpath:n/a
2011-11-08 11:46:59,539 - DEBUG
[CommitProcessor:1:FinalRequestProcessor@78] - Processing request::
sessionid:0x23220b93b4a33bc type:createSession cxid:0x0 z
xid:0x605f90a59 txntype:-10 reqpath:n/a
2011-11-08 11:46:59,587 - DEBUG
[CommitProcessor:1:FinalRequestProcessor@78] - Processing request::
sessionid:0x23220b93b4a33bc type:closeSession cxid:0x0 zx
id:0x605f90a72 txntype:-11 reqpath:n/a
2011-11-08 11:46:59,589 - DEBUG
[CommitProcessor:1:FinalRequestProcessor@78] - Processing request::
sessionid:0x23220b93b4a33bc type:create cxid:0x1 zxid:0x6
05f90a78 txntype:1 reqpath:n/a

Here is the stat output from our zookeeper servers -

Latency min/avg/max: 0/53/44712
Latency min/avg/max: 0/11/11727
Latency min/avg/max: 0/12/11994
Latency min/avg/max: 0/9/11707

The relevant bug is here - https://issues.apache.org/jira/browse/ZOOKEEPER-1208

This is a pretty serious issue since the fix is to have someone
manually delete the ephemeral node. And until that is done, it blocks
the startup of either the consumer client or the Kafka servers. Any
help here in debugging this issue is appreciated !

Thanks,
Neha
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB