-Re: Does Zookeeper Lock Entire Znode Tree During a Write
Narayanan Arunachalam 2012-05-15, 20:09
If you are not using sequence # for the nodes you create, you can use the version attribute to prevent the second client overwriting the first one. Basically the set fails, if the version is not same as when you queried the node.
Create will fail too if the node is already present.
With using sequence #, the client should check whether it is the lowest in the list of nodes created. If not wait till you become.
On May 15, 2012, at 9:23 AM, Henry Robinson <[EMAIL PROTECTED]> wrote:
> On 14 May 2012 21:15, dmly <[EMAIL PROTECTED]> wrote:
>> Does ZK lock the entire znode tree during a write? Or does ZK just locks
>> top most znode that a client is connecting to?
>> For example:
>> When I connect to "/doug" and create the "doug/lock-001" node and do and
>> update, is "/" locked or just "/doug"?
> The short answer is no, ZK does not lock the whole tree. If you look at
> ZKDatabase.java and DataTree.java you can see in processTxn etc. that the
> only the node being written to (the parent node in the case of a create
> transaction) is 'locked' in the Java sense.
> However the reason you're probably asking is that you're wondering about
> concurrency of operations, and whether a write potentially blocks another
> write from succeeding. This makes sense from a traditional database
> perspective where transactions are potentially long-running, and so
> fine-grained locking is needed to allow concurrent access to disjoint parts
> of a single table, for example.
> In ZooKeeper, all write operations are serialised - not just in the
> logical, equivalent-to-a-sequential-history sense, but in the sense that
> they are all executed one after the other. So in that sense a write 'locks'
> the whole tree, because until it's completed in memory, any subsequent
> write to memory won't take place.
> That's not to say that ZK doesn't provide some concurrency though.
> Transactions are multi-stage operations, and it's possible to pipeline
> these stages so that e.g. disk and CPU can be fully used at the same time.
> For example, another, earlier, stage of a write operation is logging the
> request to disk, for fault-tolerance purposes. It is possible for a later
> write to log itself while an earlier write is happening in memory. ZK takes
> advantage of this pipelining (see the RequestProcessor classes) so you can
> have several transactions 'in flight' at the same time, but they are all
> issued and processed in strict sequential order.
>> View this message in context:
>> Sent from the zookeeper-user mailing list archive at Nabble.com.
> Henry Robinson
> Software Engineer