Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - leader election, scheduled tasks, losing leadership


+
Eric Pederson 2012-12-09, 04:17
+
Jordan Zimmerman 2012-12-09, 04:25
+
Eric Pederson 2012-12-09, 04:49
+
Jordan Zimmerman 2012-12-09, 04:52
+
Eric Pederson 2012-12-09, 04:54
+
Jordan Zimmerman 2012-12-09, 04:57
+
Eric Pederson 2012-12-09, 04:56
+
Jordan Zimmerman 2012-12-09, 05:00
+
Henry Robinson 2012-12-09, 05:02
+
Jordan Zimmerman 2012-12-09, 05:04
+
Henry Robinson 2012-12-09, 05:12
Copy link to this message
-
Re: leader election, scheduled tasks, losing leadership
Jordan Zimmerman 2012-12-09, 05:18
If your ConnectionStateListener gets SUSPENDED or LOST you've lost connection to ZooKeeper. Therefore you cannot use that same ZooKeeper connection to manage a node that denotes the process is running or not. Only 1 VM at a time will be running the process. That process can watch for SUSPENDED/LOST and wind down the task.

> You can't assume that the notification is received locally before another
> leader election finishes elsewhere
Which notification? The ConnectionStateListener is an abstraction on ZooKeeper's watcher mechanism. It's only significant for the VM that is the leader. Non-leaders don't need to be concerned.

-JZ

On Dec 8, 2012, at 9:12 PM, Henry Robinson <[EMAIL PROTECTED]> wrote:

> You can't assume that the notification is received locally before another
> leader election finishes elsewhere (particularly if you are running slowly
> for some reason!), so it's not sufficient to guarantee that the process
> that is running locally has finished before someone else starts another.
>
> It's usually best - if possible - to restructure the system so that
> processes are idempotent to work around these kinds of problem, in
> conjunction with using the kind of primitives that Curator provides.
>
> Henry
>
> On 8 December 2012 21:04, Jordan Zimmerman <[EMAIL PROTECTED]>wrote:
>
>> This is why you need a ConnectionStateListener. You'll get a notice that
>> the connection has been suspended and you should assume all locks/leaders
>> are invalid.
>>
>> -JZ
>>
>> On Dec 8, 2012, at 9:02 PM, Henry Robinson <[EMAIL PROTECTED]> wrote:
>>
>>> What about a network disconnection? Presumably leadership is revoked when
>>> the leader appears to have failed, which can be for more reasons than a
>> VM
>>> crash (VM running slow, network event, GC pause etc).
>>>
>>> Henry
>>>
>>> On 8 December 2012 21:00, Jordan Zimmerman <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> The leader latch lock is the equivalent of task in progress. I assume
>> the
>>>> task is running in the same VM as the leader lock. The only reason the
>> VM
>>>> would lose leadership is if it crashes in which case the process would
>> die
>>>> anyway.
>>>>
>>>> -JZ
>>>>
>>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> If I recall correctly it was Henry Robinson that gave me the advice to
>>>> have
>>>>> a "task in progress" check.
>>>>>
>>>>>
>>>>> -- Eric
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>>> I am using Curator LeaderLatch :)
>>>>>>
>>>>>>
>>>>>> -- Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> You might check your leader implementation. Writing a correct leader
>>>>>>> recipe is actually quite challenging due to edge cases. Have a look
>> at
>>>>>>> Curator (disclosure: I wrote it) for an example.
>>>>>>>
>>>>>>> -JZ
>>>>>>>
>>>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <[EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>>> Actually I had the same thought and didn't consider having to do
>> this
>>>>>>> until
>>>>>>>> I talked about my project at a Zookeeper User Group a month or so
>> ago
>>>>>>> and I
>>>>>>>> was given this advice.
>>>>>>>>
>>>>>>>> I know that I do see leadership being lost/transferred when one of
>> the
>>>>>>> ZK
>>>>>>>> servers is restarted (not the whole ensemble).   And it seems like
>>>> I've
>>>>>>>> seen it happen even when the ensemble stays totally stable (though I
>>>> am
>>>>>>> not
>>>>>>>> 100% sure as it's been a while since I have worked on this
>> particular
>>>>>>>> application).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- Eric
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>
>>>>>>>>> Why would it lose leadership? The only reason I can think of is if
>>>> the
>>>>>>> ZK
>>>>>>>>> cluster goes down. In normal use, the ZK cluster won't go down (I
+
Henry Robinson 2012-12-09, 05:30
+
Jordan Zimmerman 2012-12-09, 05:41
+
Eric Pederson 2012-12-09, 21:42
+
Eric Pederson 2012-12-09, 22:10
+
Vitalii Tymchyshyn 2012-12-10, 06:49
+
Eric Pederson 2012-12-10, 11:52
+
Vitalii Tymchyshyn 2012-12-11, 20:09
+
Eric Pederson 2012-12-12, 00:54
+
Henry Robinson 2012-12-09, 04:59
+
Eric Pederson 2012-12-09, 05:00