Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Unexpected behavior with Session Timeouts in Java Client


Copy link to this message
-
Re: Unexpected behavior with Session Timeouts in Java Client
We ran into this exact scenario, and while it would have been nice to
have the timer option implemented internally by ZK, we ended up
implementing it externally ourself. We start a timer on the
disconnected event, and when it gets "close" to the session timeout,
we trigger the session lost behavior on the master.
We may be without a master for a second or two, but that's OK in our
case. As Ted mentioned, without a connection to ZK, there is no way to
time it exactly anyway.

The one advantage of having the session-lost timer running within
zkclient instead of our app, is that it could track the timer from the
last actual heartbeat, rather than the disconnected event. Depending
on the network conditions that caused the disconnection, it may have
been a while from when we actually lost connectivity to ZK to when the
disconnection event triggers, so our own timer may not be super
accurate. Having zkclient set a timer based on the last heartbeat, and
triggering the session lost event when that timer expires would be
more accurate.

-Dave
On Fri, Apr 22, 2011 at 10:03 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Well there are real limits about what knowledge you can have in a split
> brain and how much coordination there can be.
>
> Having exactly one master in such situation is impossible.  You get to pick
> your error scenario, however.  One option is to have one master almost all
> the time with a failure mode of having zero acting masters a bit of the
> time.  The other option is to have one master almost all the time with a
> failure mode that has two masters a bit of the time.  You get to pick which
> one.
>
> As Ben stated, the philosophy of ZK is to report facts that can be
> demonstrated.  Your application will work pretty well with a timer even
> though that could result in momentary double master situations.  Of course,
> it can also result in periods of zero master as well since a master cut off
> from ZK may well be cut off from the clients who want to be served.
>
> So the API isn't making a promise it can't keep.  It is promising to report
> to you as soon as it is certain of things.  And it does.
>
> On Fri, Apr 22, 2011 at 6:51 AM, Scott Fines <[EMAIL PROTECTED]> wrote:
>
>> I guess my objection would be that the API is making a promise that it can
>> only deliver part of the time. If the client can't reconnect to ZooKeeper,
>> then the client hasn't expired, which is an unusual state to find oneself
>> in, and in leader-election systems like mine could result in having two
>> practical leaders, while ZooKeeper is insisting that there is only one.
>> This
>> kind of split-brain scenario seems unavoidable in the absence of
>> probabilistic failure checking (like timeouts).
>>
>> The FAQ, I've noticed, does make mention of this phenomenon. Perhaps
>> something should be indicated there regarding the why and not just the
>> mechanics. Otherwise, developers such as myself might find themselves
>> unduly
>> confused by it :)
>>
>> Thanks for all your help,
>>
>