Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Getting confused with the "recipe for lock"


+
Zhao Boran 2013-01-11, 13:46
+
Andrey Stepachev 2013-01-11, 14:48
+
Hulunbier 2013-01-11, 16:10
+
Jordan Zimmerman 2013-01-11, 20:20
+
Hulunbier 2013-01-12, 10:30
+
Ben Bangert 2013-01-12, 17:39
+
Jordan Zimmerman 2013-01-13, 01:31
+
Hulunbier 2013-01-13, 15:05
+
Vitalii Tymchyshyn 2013-01-14, 10:37
+
Hulunbier 2013-01-14, 15:06
+
Vitalii Tymchyshyn 2013-01-14, 15:38
+
Ted Dunning 2013-01-14, 16:05
+
Hulunbier 2013-01-15, 02:28
Copy link to this message
-
Re: Getting confused with the "recipe for lock"
Hi Vitalii,

Thanks a lot, got your idea.

Suppose we are measuring the time of events outsides the system(zk & clients) .

And we have no client side time tracking routine.

And t_i < t_k if  i < k

t_0 :

client1 has created lock/node1, client2 has created lock/node2;
client1 thinks itself holding the lock; client2 does not, and watching
lock/node1.

t_1 :

ZK thinks client1's session is timeout(let's say, client1 is actually
failed to send heart-beat message on time, due to a long pause of jvm
gc).

ZK deletes lock/node1,
sends timeout message to client1,
sends "node_not_exist" message to client2 (or send this message before
the deletion, but it does not matter in our case)

but for some reason, link between zk and client1 becomes very unstable,
high packet loss, large amount of packet retransmission,
which leads to a significant packet transmission delay(between client1
and zk only), but the tcp connection is NOT broken.

t_2:

client2 got the "node_not_exist" event, and issues the getChildren Cmd

t_3:

client2 found the only node lock/node2, and thinks itself holding the
lock, and begins acting like a lock owner.

(at the same time, client1 is also thinking itself holding the lock)

t_4:

session_timeout message not reach client1 yet,

client1's jvm gc completed, doing something as the lock-owner.

t_5:

network becomes stable, finally, the session_timeout message sent from
zk reached client1;

client1 thinks itself no longer holding the lock, but it is too late,
it has done something really bad between t_4 and t_5.

--------------------------

Sorry for the grammar, I am not a native English speaker.
On Mon, Jan 14, 2013 at 11:38 PM, Vitalii Tymchyshyn <[EMAIL PROTECTED]> wrote:
> There are two events: disconnected and session expired. The ephemeral nodes
> are removed after the second one. The client  receives both. So to
> implement "at most one lock holder" scheme, client owning lock must think
> it've lost lock ownership since it've received disconnected event. So,
> there is period of time between disconnect and session expired when noone
> should have the lock. It's "safety" time to accomodate for time shifts,
> network latencies, lock ownership recheck interval (in case when client
> can't stop using resource immediatelly and simply checks regulary if it
> still holds the lock).
>
>
>
> 2013/1/14 Hulunbier <[EMAIL PROTECTED]>
>
>> Hi Vitalii,
>>
>> > I don't see why clock must be in sync.
>>
>> I don't see any reason to precisely sync the clocks either (but if we
>> could ... that would be wonderful.).
>>
>> By *some constrains of clock drift*, I mean :
>>
>> "Every node has a clock, and all clocks increase at the same rate"
>> or
>> "the server’s clock advance no faster than a known constant factor
>> faster than the client’s.".
>>
>>
>> >Also note the difference between disconnected and session
>> > expired events. This time difference is when client knows "something's
>> > wrong", but another client did not get a lock yet.
>>
>> sorry, but I failed to get your idea well; would you please give me
>> some further explanation?
>>
>>
>> On Mon, Jan 14, 2013 at 6:37 PM, Vitalii Tymchyshyn <[EMAIL PROTECTED]>
>> wrote:
>> > I don't see why clock must be in sync. They are counting time periods
>> > (timeouts). Also note the difference between disconnected and session
>> > expired events. This time difference is when client knows "something's
>> > wrong", but another client did not get a lock yet. You will have problems
>> > if client can't react (and release resources) between this two events.
>> >
>> > Best regards, Vitalii Tymchyshyn
>> >
>> >
>> > 2013/1/13 Hulunbier <[EMAIL PROTECTED]>
>> >
>> >> Thanks Jordan,
>> >>
>> >> > Assuming the clocks are in sync between all participants…
>> >>
>> >> imho, perfect clock synchronization in a distributed system is very
>> >> hard (if it can be).
>> >>
>> >> > Someone with better understanding of ZK internals can correct me, but
>> >> this is my understanding.
>> >>
>> >> I think I might have missed some very important and subtile(or
+
Jordan Zimmerman 2013-01-15, 02:23
+
Hulunbier 2013-01-15, 03:45
+
Benjamin Reed 2013-01-15, 05:27
+
Hulunbier 2013-01-15, 06:32
+
Ted Dunning 2013-01-17, 11:43
+
Hulunbier 2013-01-18, 08:26
+
Benjamin Reed 2013-01-17, 04:28
+
Hulunbier 2013-01-17, 09:05
+
Vitalii Tymchyshyn 2013-01-27, 19:29
+
Hulunbier 2013-01-13, 14:40