Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - leader election, scheduled tasks, losing leadership


Copy link to this message
-
Re: leader election, scheduled tasks, losing leadership
Jordan Zimmerman 2012-12-09, 05:04
This is why you need a ConnectionStateListener. You'll get a notice that the connection has been suspended and you should assume all locks/leaders are invalid.

-JZ

On Dec 8, 2012, at 9:02 PM, Henry Robinson <[EMAIL PROTECTED]> wrote:

> What about a network disconnection? Presumably leadership is revoked when
> the leader appears to have failed, which can be for more reasons than a VM
> crash (VM running slow, network event, GC pause etc).
>
> Henry
>
> On 8 December 2012 21:00, Jordan Zimmerman <[EMAIL PROTECTED]>wrote:
>
>> The leader latch lock is the equivalent of task in progress. I assume the
>> task is running in the same VM as the leader lock. The only reason the VM
>> would lose leadership is if it crashes in which case the process would die
>> anyway.
>>
>> -JZ
>>
>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <[EMAIL PROTECTED]> wrote:
>>
>>> If I recall correctly it was Henry Robinson that gave me the advice to
>> have
>>> a "task in progress" check.
>>>
>>>
>>> -- Eric
>>>
>>>
>>>
>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> I am using Curator LeaderLatch :)
>>>>
>>>>
>>>> -- Eric
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> You might check your leader implementation. Writing a correct leader
>>>>> recipe is actually quite challenging due to edge cases. Have a look at
>>>>> Curator (disclosure: I wrote it) for an example.
>>>>>
>>>>> -JZ
>>>>>
>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Actually I had the same thought and didn't consider having to do this
>>>>> until
>>>>>> I talked about my project at a Zookeeper User Group a month or so ago
>>>>> and I
>>>>>> was given this advice.
>>>>>>
>>>>>> I know that I do see leadership being lost/transferred when one of the
>>>>> ZK
>>>>>> servers is restarted (not the whole ensemble).   And it seems like
>> I've
>>>>>> seen it happen even when the ensemble stays totally stable (though I
>> am
>>>>> not
>>>>>> 100% sure as it's been a while since I have worked on this particular
>>>>>> application).
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> Why would it lose leadership? The only reason I can think of is if
>> the
>>>>> ZK
>>>>>>> cluster goes down. In normal use, the ZK cluster won't go down (I
>>>>> assume
>>>>>>> you're running 3 or 5 instances).
>>>>>>>
>>>>>>> -JZ
>>>>>>>
>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <[EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>>> During the time the task is running a cluster member could lose its
>>>>>>>> leadership.
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679