Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - curator leader reconnect


Copy link to this message
-
Re: curator leader reconnect
Jordan Zimmerman 2012-02-07, 21:26
I really appreciate your help Hartmut. You have, indeed, found a bug. My
test case didn't precisely replicate your situation. I updated the test so
that it did (the lock node getting deleted after session expiration) and
the same problem expressed. You also found the location of the bug making
my job very easy ;)

Thanks again - I'll push a fix and get a new build out soon.

-Jordan

P.S. I've pasted this thread on Github for others' benefit:
https://github.com/Netflix/curator/issues/24

On 2/7/12 12:39 PM, "Hartmut Lang" <[EMAIL PROTECTED]> wrote:

>Jordan, thanks for looking into this.
>
>I cloned the code and had a look. For me your test case covers, that you
>get the leadership again, after the RECONNECT happens. This is also the
>case in my code.
>But how does it check, that there is a related lock/ephemeral node in the
>ZK-Cluster? Which is not the case for me.
>
>I made some debugging:
>If the connection is lost in InterProcessMutex.release() the releaseLocks
>call will throw an exception, right?
>So the lockData is not(!) set to null (line#130).
>When the InterProcessMutex.aquire() is the called after the RECONNECT, it
>is considered as "re_entering".
>So the lock is just granted, without redoing the lock in the ZK-cluster.
>This seems not ok for me.
>But i'm the newbie here.
>
>Would be great if you can have a look.
>
>/Hartmut
>
>Am 7. Februar 2012 09:05 schrieb Jordan Zimmerman
><[EMAIL PROTECTED]>:
>
>> I just pushed a test that simulates the situation you describe and it
>> works correctly. Can you please have a look at it and see what's
>>different
>> about your case?
>>
>> TestLeaderSelectorCluster.java
>>    testLostRestart()
>> ________________________________________
>> From: Hartmut Lang [[EMAIL PROTECTED]]
>> Sent: Monday, February 06, 2012 9:55 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: curator leader reconnect
>>
>> Well i use the CLI-client to connect to the ZK-Cluster. And i see now
>> entry.
>>
>> My setup:
>> I have a cluster of three ZK-nodes.
>> I have a client starting LeaderSelector, which is connected to one
>> cluster-node.
>> I see the ephemeral node.
>>
>> I stop the  cluster-node the client is connected to. The client finally
>> sees a LOST event. The ephemeral node is gone (using CLI).
>> I start the cluster-node again. Client sees the RECONNECT and calls
>> start(). And then takeLeaderShip() is called.
>> But no ephemeral node in the cluster.
>>
>> /Hartmut
>>
>>
>> Am 6. Februar 2012 18:46 schrieb Jordan Zimmerman
>><[EMAIL PROTECTED]
>> >:
>>
>> > How are you verifying that there is no ephemeral node?
>> >
>> > -Jordan
>> >
>> > On 2/6/12 9:28 AM, "Hartmut Lang" <[EMAIL PROTECTED]> wrote:
>> >
>> > >Hi Jordan,
>> > >
>> > >thanks for your infos.
>> > >What i see in my LeaderSelector example is this:
>> > >when i just call the start() method after RECONNECT, the
>> takeLeadership()
>> > >method is called again.
>> > >But no ephemeral node does exist in the ZK-Cluster for my client. So
>> this
>> > >seems not to be right.
>> > >What could i do wrong?
>> > >
>> > >/Hartmut
>> > >Am 6. Februar 2012 07:55 schrieb Jordan Zimmerman
>> > ><[EMAIL PROTECTED]>:
>> > >
>> > >> No - don't call close. I'm afraid that it's a bit confusing. It
>>was an
>> > >> afterthought. Maybe I should add a restart() method or something.
>> > >>
>> > >> -JZ
>> > >>
>> > >> On 2/5/12 10:48 PM, "Hartmut Lang" <[EMAIL PROTECTED]>
>> wrote:
>> > >>
>> > >> >Thanks for your answer.
>> > >> >If i call start() again on the same instance, should i call
>>close()
>> > >> >before?
>> > >> >
>> > >> >My first attempt was to call close() on the LeaderSelector on a
>> > >> >LOST-Event.
>> > >> >Well then of course i do not get again the RECONNECT event on the
>> > >> >LeaderSelectorListener.
>> > >> >
>> > >> >/Hartmut
>> > >> >
>> > >> >Am 5. Februar 2012 23:53 schrieb Jordan Zimmerman
>> > >> ><[EMAIL PROTECTED]>:
>> > >> >
>> > >> >> You can either create a new LeaderSelector or call start()