Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> ephemeral node not deleted after client session closed


Copy link to this message
-
Re: ephemeral node not deleted after client session closed
Pat,

That is excellent turnaround ! I will take a look at the running the test
as well as your patch. Will be a good opportunity for me to start
understanding the zookeeper codebase.

Thanks again,
Neha

On Thursday, November 10, 2011, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> Ok, patch posted that fixes this (1208). Committers please take a look.
>
> Neha you might want to give a patched version a try. Awesome job
> helping to document and track down this issue. Thanks!
>
> Patrick
>
> On Thu, Nov 10, 2011 at 4:43 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
>> See my update to 1208 for a test that demonstrates this.
>>
>> On Thu, Nov 10, 2011 at 3:31 PM, Neha Narkhede <[EMAIL PROTECTED]>
wrote:
>>> Thanks Patrick for looking into this issue !
>>>
>>>>> The logs would indicate if an election happens. Look for "LOOKING" or
>>> "LEADING" or "FOLLOWING".
>>>
>>> The logs don't have any such entries. So I'm guessing there was no
election
>>> happening.
>>>
>>> Do you have thoughts, though, on how easy it would be to reproduce this
>>> bug, to verify the bug fix ?
>>>
>>> Thanks,
>>> Neha
>>>
>>>
>>> On Thu, Nov 10, 2011 at 2:08 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
>>>
>>>> On Thu, Nov 10, 2011 at 1:52 PM, Neha Narkhede <[EMAIL PROTECTED]
>
>>>> wrote:
>>>> > Thanks for the quick responses, guys! Please find my replies inline -
>>>> >
>>>> >>> 1) Why is the session closed, the client closed it or the cluster
>>>> > expired it?
>>>> > Cluster expired it.
>>>> >
>>>>
>>>> Yes, I realized after that the cxid is 0 in your logs - that indicates
>>>> it was expired and not closed explicitly by the client.
>>>>
>>>>
>>>> >>> 3) the znode exists on all 4 servers, is that right?
>>>> > Yes
>>>> >
>>>>
>>>> This holds up my theory that the PrepRequestProcessor is accepting a
>>>> create from the client after the session has been expired.
>>>>
>>>> >>> 5) why are your max latencies, as well as avg latency, so high?
>>>> >>> a) are these dedicated boxes, not virtualized, correct?
>>>> > these are dedicated boxes, but zk is currently co-located with
kafka, but
>>>> > on different disks
>>>> >
>>>> >>> b) is the jvm going into gc pause? (try turning on verbose
logging, or
>>>> > use "jstat" with the gc options to see the history if you still have
>>>> > those jvms running)
>>>> > I don't believe we had gc logs on these machines. So its unclear.
>>>> >
>>>> >>> d) do you have dedicated spindles for the ZK WAL? If not another
>>>> > process might be causing the fsyncs to pause. (you can use iostat or
>>>> > strace to monitor this)
>>>> > No. The log4j and zk txn logs share the same disks.
>>>> >
>>>> >>> Is that the log from the server that's got the 44sec max latency?
>>>> > Yes.
>>>> >
>>>> >>> This is 3.3.3 ?
>>>> > Yes.
>>>> >
>>>> >>> was there any instability in the quorum itself during this time
>>>> > period?
>>>> > How do I find that out ?
>>>>
>>>> The logs would indicate if an election happens. Look for "LOOKING" or
>>>> "LEADING" or "FOLLOWING".
>>>>
>>>>
>>>> Your comments are consistent with my theory. Seems like a bug in PRP
>>>> session validation to me.
>>>>
>>>> Patrick
>>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB