|
|
-
Re: ephemeral node not deleted after client session closed
kishore g 2011-11-11, 18:15
Hi Pat,
We are already working on that, it was because of deploying zk along with other applications. We will have separate boxes for ZK very soon.
I already saw the stats and could not co relate with spikes, unfortunately we dint have gc logs. There were few deployment rules that were clearly violated and we will fix them.
Good news is that we found an issue :-).Thanks again for your help.
thanks, Kishore G
On Fri, Nov 11, 2011 at 9:47 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> We always triage new issues as they come in (same when 1208 originally > came in). However our ability to determine the cause is often bounded > by the information provided by the user, which in this recent update > was excellent and pointed out exactly the flaw. Kudos. > > We'll button up this issue (1208) for 3.3/3.4/trunk. After which I'll > work on cutting a new 3.3.4 that includes this and some other fixes. > It would be good if you could test this patch in the mean time. > > Also, this is particularly worrisome for me, you reported the > following for your servers: > > Latency min/avg/max: 0/53/44712 > Latency min/avg/max: 0/11/11727 > Latency min/avg/max: 0/12/11994 > Latency min/avg/max: 0/9/11707 > > That's really really terrible performance and you won't be happy with > ZK as a result. You need to followup with your ops team to determine > why the performance you are seeing is so terrible. Both in terms of > average and max latency. > > Spikes in max latency is usually due to GC, swap, or bad disk > performance for the WAL. > Bad average latency might indicate poor network performance, or again > bad disk performance. > > Try turning on CMS/parallelGC. Also try using iostat and look at the > await times you're seeing for the WAL disk (correlate that with spikes > in max latency, those counters can be reset using a 4letterword). > > Regards, > > Patrick > > On Fri, Nov 11, 2011 at 7:21 AM, Neha Narkhede <[EMAIL PROTECTED]> > wrote: > > Pat, > > > > That is excellent turnaround ! I will take a look at the running the test > > as well as your patch. Will be a good opportunity for me to start > > understanding the zookeeper codebase. > > > > Thanks again, > > Neha > > > > On Thursday, November 10, 2011, Patrick Hunt <[EMAIL PROTECTED]> wrote: > >> Ok, patch posted that fixes this (1208). Committers please take a look. > >> > >> Neha you might want to give a patched version a try. Awesome job > >> helping to document and track down this issue. Thanks! > >> > >> Patrick > >> > >> On Thu, Nov 10, 2011 at 4:43 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > >>> See my update to 1208 for a test that demonstrates this. > >>> > >>> On Thu, Nov 10, 2011 at 3:31 PM, Neha Narkhede < > [EMAIL PROTECTED]> > > wrote: > >>>> Thanks Patrick for looking into this issue ! > >>>> > >>>>>> The logs would indicate if an election happens. Look for "LOOKING" > or > >>>> "LEADING" or "FOLLOWING". > >>>> > >>>> The logs don't have any such entries. So I'm guessing there was no > > election > >>>> happening. > >>>> > >>>> Do you have thoughts, though, on how easy it would be to reproduce > this > >>>> bug, to verify the bug fix ? > >>>> > >>>> Thanks, > >>>> Neha > >>>> > >>>> > >>>> On Thu, Nov 10, 2011 at 2:08 PM, Patrick Hunt <[EMAIL PROTECTED]> > wrote: > >>>> > >>>>> On Thu, Nov 10, 2011 at 1:52 PM, Neha Narkhede < > [EMAIL PROTECTED] > >> > >>>>> wrote: > >>>>> > Thanks for the quick responses, guys! Please find my replies > inline - > >>>>> > > >>>>> >>> 1) Why is the session closed, the client closed it or the cluster > >>>>> > expired it? > >>>>> > Cluster expired it. > >>>>> > > >>>>> > >>>>> Yes, I realized after that the cxid is 0 in your logs - that > indicates > >>>>> it was expired and not closed explicitly by the client. > >>>>> > >>>>> > >>>>> >>> 3) the znode exists on all 4 servers, is that right? > >>>>> > Yes > >>>>> > > >>>>> > >>>>> This holds up my theory that the PrepRequestProcessor is accepting a > >>>>> create from the client after the session has been expired.
+
kishore g 2011-11-11, 18:15
-
ephemeral node not deleted after client long gone
Jun Rao 2011-02-11, 18:01
Hi,
I found an issue in zookeeper 3.3.0 where an ephemeral node didn't get deleted after the client was long gone. This seems to be a rare event and happens 1 out of 600 tries. Has there been a similar problem reported/fixed? Thanks,
Jun
+
Jun Rao 2011-02-11, 18:01
-
Re: ephemeral node not deleted after client long gone
Mahadev Konar 2011-02-11, 18:08
Hi Jun, Yes there was a bug reported: https://issues.apache.org/jira/browse/ZOOKEEPER-919Is this what you are seeing? thanks mahadev On Fri, Feb 11, 2011 at 10:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Hi, > > I found an issue in zookeeper 3.3.0 where an ephemeral node didn't get > deleted after the client was long gone. This seems to be a rare event and > happens 1 out of 600 tries. Has there been a similar problem reported/fixed? > Thanks, > > Jun >
+
Mahadev Konar 2011-02-11, 18:08
-
Re: ephemeral node not deleted after client long gone
Jun Rao 2011-02-11, 18:43
Can the problems fixed in https://issues.apache.org/jira/browse/ZOOKEEPER-962 and https://issues.apache.org/<https://issues.apache.org/jira/browse/ZOOKEEPER-919>jira/browse/ZOOKEEPER-919< https://issues.apache.org/jira/browse/ZOOKEEPER-919>happen even when there is no restart in the ZK server ensemble? For the problem that I have seen, the ZK servers have always been up. Thanks, Jun On Fri, Feb 11, 2011 at 10:08 AM, Mahadev Konar <[EMAIL PROTECTED]> wrote: > Hi Jun, > Yes there was a bug reported: > > https://issues.apache.org/jira/browse/ZOOKEEPER-919> > Is this what you are seeing? > > thanks > mahadev > > On Fri, Feb 11, 2011 at 10:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I found an issue in zookeeper 3.3.0 where an ephemeral node didn't get > > deleted after the client was long gone. This seems to be a rare event and > > happens 1 out of 600 tries. Has there been a similar problem > reported/fixed? > > Thanks, > > > > Jun > > >
+
Jun Rao 2011-02-11, 18:43
-
Re: ephemeral node not deleted after client long gone
Mahadev Konar 2011-02-11, 18:54
Jun Rao, No it cannot happen without a zookeeper restart. Are you sure you are shutting down the client? thanks mahadev On Fri, Feb 11, 2011 at 10:43 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Can the problems fixed in > https://issues.apache.org/jira/browse/ZOOKEEPER-962 and > https://issues.apache.org/<https://issues.apache.org/jira/browse/ZOOKEEPER-919>> jira/browse/ZOOKEEPER-919< https://issues.apache.org/jira/browse/ZOOKEEPER-919>> happen > even when there is no restart in the ZK server ensemble? For the problem > that I have seen, the ZK servers have always been up. > > Thanks, > > Jun > > On Fri, Feb 11, 2011 at 10:08 AM, Mahadev Konar <[EMAIL PROTECTED]> wrote: > >> Hi Jun, >> Yes there was a bug reported: >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-919>> >> Is this what you are seeing? >> >> thanks >> mahadev >> >> On Fri, Feb 11, 2011 at 10:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > Hi, >> > >> > I found an issue in zookeeper 3.3.0 where an ephemeral node didn't get >> > deleted after the client was long gone. This seems to be a rare event and >> > happens 1 out of 600 tries. Has there been a similar problem >> reported/fixed? >> > Thanks, >> > >> > Jun >> > >> >
+
Mahadev Konar 2011-02-11, 18:54
-
Re: ephemeral node not deleted after client long gone
Jun Rao 2011-02-11, 19:58
Hmm, I am pretty sure the client that created the ephemeral node is gone. That client typically creates a bunch of ephemeral nodes. It seems that all nodes except one is gone. The hanging ephemeral node can be read from any ZK server and its info is listed below. Is there a way to get the information of the client who created an ephemeral node (host, processid, etc)? ctime = Fri Feb 11 04:39:25 PST 2011 mZxid = 0x1f03f5ea35 mtime = Fri Feb 11 04:39:25 PST 2011 pZxid = 0x1f03f5ea35 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x22cab09a7380273 dataLength = 40 numChildren = 0 Thanks, Jun On Fri, Feb 11, 2011 at 10:54 AM, Mahadev Konar <[EMAIL PROTECTED]> wrote: > Jun Rao, > No it cannot happen without a zookeeper restart. > > Are you sure you are shutting down the client? > > thanks > mahadev > > On Fri, Feb 11, 2011 at 10:43 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > Can the problems fixed in > > https://issues.apache.org/jira/browse/ZOOKEEPER-962 and > > https://issues.apache.org/<> https://issues.apache.org/jira/browse/ZOOKEEPER-919>> > jira/browse/ZOOKEEPER-919< > https://issues.apache.org/jira/browse/ZOOKEEPER-919>> > happen > > even when there is no restart in the ZK server ensemble? For the problem > > that I have seen, the ZK servers have always been up. > > > > Thanks, > > > > Jun > > > > On Fri, Feb 11, 2011 at 10:08 AM, Mahadev Konar <[EMAIL PROTECTED]> > wrote: > > > >> Hi Jun, > >> Yes there was a bug reported: > >> > >> https://issues.apache.org/jira/browse/ZOOKEEPER-919> >> > >> Is this what you are seeing? > >> > >> thanks > >> mahadev > >> > >> On Fri, Feb 11, 2011 at 10:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > Hi, > >> > > >> > I found an issue in zookeeper 3.3.0 where an ephemeral node didn't get > >> > deleted after the client was long gone. This seems to be a rare event > and > >> > happens 1 out of 600 tries. Has there been a similar problem > >> reported/fixed? > >> > Thanks, > >> > > >> > Jun > >> > > >> > > >
+
Jun Rao 2011-02-11, 19:58
-
RE: ephemeral node not deleted after client long gone
Fournier, Camille F. [Tec... 2011-02-11, 21:32
There should be a log line somewhere associating that ephemeralOwner sessionID to a login, something like: 2010-12-07 02:04:26,824 - INFO [CommitProcessor:0:NIOServerCnxn@1580] - Established session 0x2cbe924f570000 with negotiated timeout 30000 for client /10.150.27.112:53673 -----Original Message----- From: Jun Rao [mailto:[EMAIL PROTECTED]] Sent: Friday, February 11, 2011 2:58 PM To: [EMAIL PROTECTED] Subject: Re: ephemeral node not deleted after client long gone Hmm, I am pretty sure the client that created the ephemeral node is gone. That client typically creates a bunch of ephemeral nodes. It seems that all nodes except one is gone. The hanging ephemeral node can be read from any ZK server and its info is listed below. Is there a way to get the information of the client who created an ephemeral node (host, processid, etc)? ctime = Fri Feb 11 04:39:25 PST 2011 mZxid = 0x1f03f5ea35 mtime = Fri Feb 11 04:39:25 PST 2011 pZxid = 0x1f03f5ea35 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x22cab09a7380273 dataLength = 40 numChildren = 0 Thanks, Jun On Fri, Feb 11, 2011 at 10:54 AM, Mahadev Konar <[EMAIL PROTECTED]> wrote: > Jun Rao, > No it cannot happen without a zookeeper restart. > > Are you sure you are shutting down the client? > > thanks > mahadev > > On Fri, Feb 11, 2011 at 10:43 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > Can the problems fixed in > > https://issues.apache.org/jira/browse/ZOOKEEPER-962 and > > https://issues.apache.org/<> https://issues.apache.org/jira/browse/ZOOKEEPER-919>> > jira/browse/ZOOKEEPER-919< > https://issues.apache.org/jira/browse/ZOOKEEPER-919>> > happen > > even when there is no restart in the ZK server ensemble? For the problem > > that I have seen, the ZK servers have always been up. > > > > Thanks, > > > > Jun > > > > On Fri, Feb 11, 2011 at 10:08 AM, Mahadev Konar <[EMAIL PROTECTED]> > wrote: > > > >> Hi Jun, > >> Yes there was a bug reported: > >> > >> https://issues.apache.org/jira/browse/ZOOKEEPER-919> >> > >> Is this what you are seeing? > >> > >> thanks > >> mahadev > >> > >> On Fri, Feb 11, 2011 at 10:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > Hi, > >> > > >> > I found an issue in zookeeper 3.3.0 where an ephemeral node didn't get > >> > deleted after the client was long gone. This seems to be a rare event > and > >> > happens 1 out of 600 tries. Has there been a similar problem > >> reported/fixed? > >> > Thanks, > >> > > >> > Jun > >> > > >> > > >
+
Fournier, Camille F. [Tec... 2011-02-11, 21:32
-
Re: ephemeral node not deleted after client long gone
Jun Rao 2011-02-11, 22:49
I saw the following entries on the session. It seems that the client that created the ephemeral node has already been closed. Also, I was using ZK 3.2.1 server and ZK 3.3.0 client. Any issues like that related to ZK 3.2.1? Thanks, 2011-02-11 04:39:07,350 - INFO [NIOServerCxn.Factory:12913:NIOServerCnxn@615] - Creating new session 0x22cab09a7380273 2011-02-11 04:39:25,803 - INFO [CommitProcessor:0:NIOServerCnxn@833] - closing session:0x22cab09a7380273 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/172.17.135.39:12913remote=/ 172.16.78.170:45385] Jun On Fri, Feb 11, 2011 at 1:32 PM, Fournier, Camille F. [Tech] < [EMAIL PROTECTED]> wrote: > There should be a log line somewhere associating that ephemeralOwner > sessionID to a login, something like: > 2010-12-07 02:04:26,824 - INFO [CommitProcessor:0:NIOServerCnxn@1580] - > Established session 0x2cbe924f570000 with negotiated timeout 30000 for > client /10.150.27.112:53673 > > -----Original Message----- > From: Jun Rao [mailto:[EMAIL PROTECTED]] > Sent: Friday, February 11, 2011 2:58 PM > To: [EMAIL PROTECTED] > Subject: Re: ephemeral node not deleted after client long gone > > Hmm, I am pretty sure the client that created the ephemeral node is gone. > That client typically creates a bunch of ephemeral nodes. It seems that all > nodes except one is gone. The hanging ephemeral node can be read from any > ZK > server and its info is listed below. Is there a way to get the information > of the client who created an ephemeral node (host, processid, etc)? > > ctime = Fri Feb 11 04:39:25 PST 2011 > mZxid = 0x1f03f5ea35 > mtime = Fri Feb 11 04:39:25 PST 2011 > pZxid = 0x1f03f5ea35 > cversion = 0 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x22cab09a7380273 > dataLength = 40 > numChildren = 0 > > Thanks, > > Jun > > On Fri, Feb 11, 2011 at 10:54 AM, Mahadev Konar <[EMAIL PROTECTED]> > wrote: > > > Jun Rao, > > No it cannot happen without a zookeeper restart. > > > > Are you sure you are shutting down the client? > > > > thanks > > mahadev > > > > On Fri, Feb 11, 2011 at 10:43 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > Can the problems fixed in > > > https://issues.apache.org/jira/browse/ZOOKEEPER-962 and > > > https://issues.apache.org/<> > https://issues.apache.org/jira/browse/ZOOKEEPER-919>> > > jira/browse/ZOOKEEPER-919< > > https://issues.apache.org/jira/browse/ZOOKEEPER-919>> > > happen > > > even when there is no restart in the ZK server ensemble? For the > problem > > > that I have seen, the ZK servers have always been up. > > > > > > Thanks, > > > > > > Jun > > > > > > On Fri, Feb 11, 2011 at 10:08 AM, Mahadev Konar <[EMAIL PROTECTED]> > > wrote: > > > > > >> Hi Jun, > > >> Yes there was a bug reported: > > >> > > >> https://issues.apache.org/jira/browse/ZOOKEEPER-919> > >> > > >> Is this what you are seeing? > > >> > > >> thanks > > >> mahadev > > >> > > >> On Fri, Feb 11, 2011 at 10:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > >> > Hi, > > >> > > > >> > I found an issue in zookeeper 3.3.0 where an ephemeral node didn't > get > > >> > deleted after the client was long gone. This seems to be a rare > event > > and > > >> > happens 1 out of 600 tries. Has there been a similar problem > > >> reported/fixed? > > >> > Thanks, > > >> > > > >> > Jun > > >> > > > >> > > > > > >
+
Jun Rao 2011-02-11, 22:49
|
|