Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Getting confused with the "recipe for lock"

Copy link to this message
Re: Getting confused with the "recipe for lock"
Hulunbier 2013-01-15, 03:45
Hi Jordan,

> Why would client 1s connection be unstable but client 2s not? In any normal usage the ZK clients are going to be on the same network. Or, are you thinking cross-data-center usage? In my opinion, ZooKeeper is not suited to cross data center usage.

er... the word "unstable" I used is misleading; A full functional(or
stable?) tcp connection is supposed to be encountered with some
network congestion, and should / can handle this situation well, but
might be with some delay of delivering the segments; High volume of
traffic in LAN may lead to the above situation, and it is not rare, I

Even if there was no such congestion, there is always a time lag,
between zk sends session-timeout message and client receives the
Without any assumption, we can not ensure that , the client could be
ware of that it no longer has the lock - before other clients got the
node_not_exist notification and successful executed getChildren and
thought it(one of the others) having the lock.

I think in practice, we could (or have to) accept this assumption :
"the server’s clock advance no faster than a known constant factor
faster than the client’s".

But the assumption itself is not enough for the correctness of lock
protocol; because the client can only passively waiting for the
session_time_out message, so the client may need a timer to explicitly
check time elapsed.

But the recipe claims clearly that:  "at any snapshot in time no two
clients think they hold the same lock", and "There is no polling or
> In any event, as others have pointed out, Zookeeper is _not_ a transactional system.

> It is an eventually consistent system that will give you a reasonable degree of distributed coordination semantics.

I should admit that I do not know whether ZK is eventually consistent
, transactional or not. (BTW, there is a recipe for 2pc, and some guys
claim that *Zab* is Sequential Consistent);

Does these properties of ZK implies there is assumptions of clock drift?

>There are edge cases as you describe but they are in the level of noise.

You might be right, but for me, edge cases is what I am worrying about
(please do not get me wrong, I mean, different applications have
different requirements / constraints).

> -Jordan
> On Jan 14, 2013, at 5:52 PM, Hulunbier <[EMAIL PROTECTED]> wrote:
>> Hi Vitalii,
>> Thanks a lot, got your idea.
>> Suppose we are measuring the time of events outsides the system(zk & clients) .
>> And we have no client side time tracking routine.
>> And t_i < t_k if  i < k
>> t_0 :
>> client1 has created lock/node1, client2 has created lock/node2;
>> client1 thinks itself holding the lock; client2 does not, and watching
>> lock/node1.
>> t_1 :
>> ZK thinks client1's session is timeout(let's say, client1 is actually
>> failed to send heart-beat message on time, due to a long pause of jvm
>> gc).
>> ZK deletes lock/node1,
>> sends timeout message to client1,
>> sends "node_not_exist" message to client2 (or send this message before
>> the deletion, but it does not matter in our case)
>> but for some reason, link between zk and client1 becomes very unstable,
>> high packet loss, large amount of packet retransmission,
>> which leads to a significant packet transmission delay(between client1
>> and zk only), but the tcp connection is NOT broken.
>> t_2:
>> client2 got the "node_not_exist" event, and issues the getChildren Cmd
>> t_3:
>> client2 found the only node lock/node2, and thinks itself holding the
>> lock, and begins acting like a lock owner.
>> (at the same time, client1 is also thinking itself holding the lock)
>> t_4:
>> session_timeout message not reach client1 yet,
>> client1's jvm gc completed, doing something as the lock-owner.
>> t_5:
>> network becomes stable, finally, the session_timeout message sent from
>> zk reached client1;
>> client1 thinks itself no longer holding the lock, but it is too late,
>> it has done something really bad between t_4 and t_5.