-Re: Znodes are really wait-free objects?
Ted Dunning 2012-12-10, 17:58
Wait-free means that from the user's point of view, all operations will
proceed or fail immediately. Synchronization internal to the server to
ensure coherent memory access doesn't really apply here.
The point is that ZK is not a lock manager and there is no corollary to a
wait operation. This is extremely important because locks encourage
timeout designs which are notoriously unreliable (at least in expectation
when implemented by normal human engineers).
The high level operations that have to happen are, roughly,
a) the node receiving your request gets the request and validates that it
appears well formed.
b) if the request is a read operation, it is satisfied by direct access to
c) if the request is a write operation, it is forwarded to the master
d) the master looks at memory and the queue to decide if this operation
will succeed or fail
e) if the operation will fail, the failure is returned
f) if the operation will succeed, then it is converted to an idempotent
operation and sent to the rest of the cluster for committing
g) once a quorum commits the operation, success is returned
h) if a quorum fails to commit the operation in time, failure is returned
This is a moderately long list, but nothing here has any long lived locks
or much chance for starvation. That leads to the claim that all operations
are lock-less from the user point of view.
On Mon, Dec 10, 2012 at 9:45 AM, Dmitry Kuvayskiy <
[EMAIL PROTECTED]> wrote:
> Hello all.
> The paper "ZooKeeper: Wait-free coordination for Internet-scale
> systems" claims that znodes are wait-free objects (citing: ``Our
> system, Zookeeper, hence implements an API that manipulates simple
> _wait-free_ data objects organized hierarchically as in file
> systems''). But I wonder if they are really "wait-free" (in the sense
> that "every operation has a bound on the number of steps the algorithm
> will take before the operation completes")?
> As far as I know, ZooKeeper uses locks (well, "synchronized" keyword)
> internally. I also cannot imagine a system that big and complex
> working only with wait-free data structures. And if we use locks
> internally, how can we state that the external interface is wait-free?
> Also, I didn't find anything about "wait-freedom" on the
> http://zookeeper.apache.org/ site. So can anyone explain a bit, what
> this "wait-free" really means and how it is implemented in the code?
> Yours sincerely,
> Dmitry Kuvayskiy