Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Getting confused with the "recipe for lock"


Copy link to this message
-
Re: Getting confused with the "recipe for lock"
Thanks!

I do agree with you that Client1 will eventually know that the lock is
invalid, by tracking disconnection and time.

But,

1. Time can not by precisely synchronized between servers; it is likely
that client1 will detect session timeout (by its timer thread),  after
server treats client1's session as timeouted and Client2 thinks itself
holding the lock.

so,  within a small time gap, more than one client may believe themselves
holding the lock.

2. thus , the protocol of lock can still not guarantee exclusiveness;  is
it ... er... broken ?

On Fri, Jan 11, 2013 at 10:48 PM, Andrey Stepachev <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Yes, this scenario is very likely.
> But it will work only for long running tasks (more then session timeout),
> for short livinig tasks lock will be unlocked before session timeout,
> surely.
>
> In case of long living locks, Client1 should track disconnection from zk
> cluster and assume, that lock was abandoned (and somehow notify lock owner
> about that). Client can know value of session timeout and spawn timer, and
> action accordingly program logic. As example it can interrupt thread, which
> created lock, and rise some flag, so long running task can know - lock is
> not valid.
>
>
> On Fri, Jan 11, 2013 at 5:46 PM, Zhao Boran <[EMAIL PROTECTED]> wrote:
>
> > While reading the zookeeper's recipe for
> > lock<http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks
> >,
> > I get confused:
> >
> > Seems that this recipe-for-distributed-lock can not guarantee *"any
> > snapshot in time no two clients think they hold the same lock"*.
> >
> > But since zookeeper is so widely adopted, if there were such mistakes in
> > the reference doc, someone should have pointed it out long time ago.
> >
> > So, what did I misunderstand? please help me!
> >
> > Recipe-for-distributed-lock (from
> > http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks)
> >
> > Locks
> >
> > Fully distributed locks that are globally synchronous, *meaning at any
> > snapshot in time no two clients think they hold the same lock*. These can
> > be implemented using ZooKeeeper. As with priority queues, first define a
> > lock node.
> >
> >    1. Call create( ) with a pathname of "*locknode*/guid-lock-" and the
> >    sequence and ephemeral flags set.
> >    2. Call getChildren( ) on the lock node without setting the watch flag
> >    (this is important to avoid the herd effect).
> >    3. If the pathname created in step 1 has the lowest sequence number
> >    suffix, the client has the lock and the client exits the protocol.
> >    4. The client calls exists( ) with the watch flag set on the path in
> the
> >    lock directory with the next lowest sequence number.
> >    5. if exists( ) returns false, go to step 2. Otherwise, wait for a
> >    notification for the pathname from the previous step before going to
> > step 2.
> >
> > Considering the following case:
> >
> >    -
> >
> >    Client1 successfully acquired the lock(in step3), with zk node
> >    "locknode/guid-lock-0";
> >    -
> >
> >    Client2 created node "locknode/guid-lock-1", failed to acquire the
> lock,
> >    and watching "locknode/guid-lock-0";
> >    -
> >
> >    Later, for some reasons(network congestion?), client1 failed to send
> >    heart beat message to zk cluster on time, but client1 is still
> perfectly
> >    working, and assuming itself still holding the lock.
> >    -
> >
> >    But, Zookeeper may think client1's session is timeouted, and then
> >    1. deletes "locknode/guid-lock-0"
> >       2. sends a notification to Client2 (or send the notification
> first?)
> >       3. but can not send "session timeout" notification to client1 in
> time
> >       (due to network congestion?)
> >
> >
> >    -
> >
> >    Client2 got the notification, goes to step 2, gets the only node
> >    ""locknode/guid-lock-1", which is created by itself; thus, client2
> > assumes
> >    it hold the lock.
> >    -
> >
> >    But at the same time, client1 assumes it hold the lock.