Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Getting confused with the "recipe for lock"

Copy link to this message
Re: Getting confused with the "recipe for lock"
sorry to jump in the middle, but i thought i'd point out a couple of things.

at the heart of ZK is Zab, which is an atomic broadcast protocol (it
actually has stronger guarantees than just atomic broadcast: it also
guarantees primary order). updates go through this protocol which
gives us sequential consistency for writes.

failure detection uses timeouts, as most failure detectors do, so we
have some assumptions on bounds of message delays and drifts of
clocks. in the end, these assumptions are manifest in the sync and
initial timeouts of the server and the session timeouts of the

as long as the assumptions are true, things will stay consistent, if
the assumptions fail, such as when HBase region servers went into gc
for many minutes and then woke up still thinking they are the leader,
bad things can happen. the fix may be to use more conservative
assumptions or to use a fencing scheme with external resources.

if the assumptions are violated by the zookeeper cluster, it will
manifest as a liveness problem rather than a safety issue. (in theory
at least, we do have bugs occasionally :)


On Mon, Jan 14, 2013 at 7:45 PM, Hulunbier <[EMAIL PROTECTED]> wrote:
> Hi Jordan,
>> Why would client 1s connection be unstable but client 2s not? In any normal usage the ZK clients are going to be on the same network. Or, are you thinking cross-data-center usage? In my opinion, ZooKeeper is not suited to cross data center usage.
> er... the word "unstable" I used is misleading; A full functional(or
> stable?) tcp connection is supposed to be encountered with some
> network congestion, and should / can handle this situation well, but
> might be with some delay of delivering the segments; High volume of
> traffic in LAN may lead to the above situation, and it is not rare, I
> think.
> Even if there was no such congestion, there is always a time lag,
> between zk sends session-timeout message and client receives the
> message;
> Without any assumption, we can not ensure that , the client could be
> ware of that it no longer has the lock - before other clients got the
> node_not_exist notification and successful executed getChildren and
> thought it(one of the others) having the lock.
> I think in practice, we could (or have to) accept this assumption :
> "the server’s clock advance no faster than a known constant factor
> faster than the client’s".
> But the assumption itself is not enough for the correctness of lock
> protocol; because the client can only passively waiting for the
> session_time_out message, so the client may need a timer to explicitly
> check time elapsed.
> But the recipe claims clearly that:  "at any snapshot in time no two
> clients think they hold the same lock", and "There is no polling or
> timeouts."
>> In any event, as others have pointed out, Zookeeper is _not_ a transactional system.
>> It is an eventually consistent system that will give you a reasonable degree of distributed coordination semantics.
> I should admit that I do not know whether ZK is eventually consistent
> , transactional or not. (BTW, there is a recipe for 2pc, and some guys
> claim that *Zab* is Sequential Consistent);
> Does these properties of ZK implies there is assumptions of clock drift?
>>There are edge cases as you describe but they are in the level of noise.
> You might be right, but for me, edge cases is what I am worrying about
> (please do not get me wrong, I mean, different applications have
> different requirements / constraints).
>> -Jordan
>> On Jan 14, 2013, at 5:52 PM, Hulunbier <[EMAIL PROTECTED]> wrote:
>>> Hi Vitalii,
>>> Thanks a lot, got your idea.
>>> Suppose we are measuring the time of events outsides the system(zk & clients) .
>>> And we have no client side time tracking routine.
>>> And t_i < t_k if  i < k
>>> t_0 :
>>> client1 has created lock/node1, client2 has created lock/node2;
>>> client1 thinks itself holding the lock; client2 does not, and watching