Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Getting confused with the "recipe for lock"


Copy link to this message
-
Getting confused with the "recipe for lock"
Zhao Boran 2013-01-11, 13:46
While reading the zookeeper's recipe for
lock<http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks>,
I get confused:

Seems that this recipe-for-distributed-lock can not guarantee *"any
snapshot in time no two clients think they hold the same lock"*.

But since zookeeper is so widely adopted, if there were such mistakes in
the reference doc, someone should have pointed it out long time ago.

So, what did I misunderstand? please help me!

Recipe-for-distributed-lock (from
http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks)

Locks

Fully distributed locks that are globally synchronous, *meaning at any
snapshot in time no two clients think they hold the same lock*. These can
be implemented using ZooKeeeper. As with priority queues, first define a
lock node.

   1. Call create( ) with a pathname of "*locknode*/guid-lock-" and the
   sequence and ephemeral flags set.
   2. Call getChildren( ) on the lock node without setting the watch flag
   (this is important to avoid the herd effect).
   3. If the pathname created in step 1 has the lowest sequence number
   suffix, the client has the lock and the client exits the protocol.
   4. The client calls exists( ) with the watch flag set on the path in the
   lock directory with the next lowest sequence number.
   5. if exists( ) returns false, go to step 2. Otherwise, wait for a
   notification for the pathname from the previous step before going to step 2.

Considering the following case:

   -

   Client1 successfully acquired the lock(in step3), with zk node
   "locknode/guid-lock-0";
   -

   Client2 created node "locknode/guid-lock-1", failed to acquire the lock,
   and watching "locknode/guid-lock-0";
   -

   Later, for some reasons(network congestion?), client1 failed to send
   heart beat message to zk cluster on time, but client1 is still perfectly
   working, and assuming itself still holding the lock.
   -

   But, Zookeeper may think client1's session is timeouted, and then
   1. deletes "locknode/guid-lock-0"
      2. sends a notification to Client2 (or send the notification first?)
      3. but can not send "session timeout" notification to client1 in time
      (due to network congestion?)
   -

   Client2 got the notification, goes to step 2, gets the only node
   ""locknode/guid-lock-1", which is created by itself; thus, client2 assumes
   it hold the lock.
   -

   But at the same time, client1 assumes it hold the lock.

Is this a valid scenario?

Thanks a lot!
+
Andrey Stepachev 2013-01-11, 14:48
+
Hulunbier 2013-01-11, 16:10
+
Jordan Zimmerman 2013-01-11, 20:20
+
Hulunbier 2013-01-12, 10:30
+
Ben Bangert 2013-01-12, 17:39
+
Jordan Zimmerman 2013-01-13, 01:31
+
Hulunbier 2013-01-13, 15:05
+
Vitalii Tymchyshyn 2013-01-14, 10:37
+
Hulunbier 2013-01-14, 15:06
+
Vitalii Tymchyshyn 2013-01-14, 15:38
+
Ted Dunning 2013-01-14, 16:05
+
Hulunbier 2013-01-15, 02:28
+
Hulunbier 2013-01-15, 01:52
+
Jordan Zimmerman 2013-01-15, 02:23
+
Hulunbier 2013-01-15, 03:45
+
Benjamin Reed 2013-01-15, 05:27
+
Hulunbier 2013-01-15, 06:32
+
Ted Dunning 2013-01-17, 11:43
+
Hulunbier 2013-01-18, 08:26
+
Benjamin Reed 2013-01-17, 04:28
+
Hulunbier 2013-01-17, 09:05
+
Vitalii Tymchyshyn 2013-01-27, 19:29
+
Hulunbier 2013-01-13, 14:40