Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> ephemeral node not deleted after client session closed


Copy link to this message
-
Re: ephemeral node not deleted after client session closed
Thanks Patrick for looking into this issue !

>> The logs would indicate if an election happens. Look for "LOOKING" or
"LEADING" or "FOLLOWING".

The logs don't have any such entries. So I'm guessing there was no election
happening.

Do you have thoughts, though, on how easy it would be to reproduce this
bug, to verify the bug fix ?

Thanks,
Neha
On Thu, Nov 10, 2011 at 2:08 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:

> On Thu, Nov 10, 2011 at 1:52 PM, Neha Narkhede <[EMAIL PROTECTED]>
> wrote:
> > Thanks for the quick responses, guys! Please find my replies inline -
> >
> >>> 1) Why is the session closed, the client closed it or the cluster
> > expired it?
> > Cluster expired it.
> >
>
> Yes, I realized after that the cxid is 0 in your logs - that indicates
> it was expired and not closed explicitly by the client.
>
>
> >>> 3) the znode exists on all 4 servers, is that right?
> > Yes
> >
>
> This holds up my theory that the PrepRequestProcessor is accepting a
> create from the client after the session has been expired.
>
> >>> 5) why are your max latencies, as well as avg latency, so high?
> >>> a) are these dedicated boxes, not virtualized, correct?
> > these are dedicated boxes, but zk is currently co-located with kafka, but
> > on different disks
> >
> >>> b) is the jvm going into gc pause? (try turning on verbose logging, or
> > use "jstat" with the gc options to see the history if you still have
> > those jvms running)
> > I don't believe we had gc logs on these machines. So its unclear.
> >
> >>> d) do you have dedicated spindles for the ZK WAL? If not another
> > process might be causing the fsyncs to pause. (you can use iostat or
> > strace to monitor this)
> > No. The log4j and zk txn logs share the same disks.
> >
> >>> Is that the log from the server that's got the 44sec max latency?
> > Yes.
> >
> >>> This is 3.3.3 ?
> > Yes.
> >
> >>> was there any instability in the quorum itself during this time
> > period?
> > How do I find that out ?
>
> The logs would indicate if an election happens. Look for "LOOKING" or
> "LEADING" or "FOLLOWING".
>
>
> Your comments are consistent with my theory. Seems like a bug in PRP
> session validation to me.
>
> Patrick
>