-Re: Zookeeper performance
Ted Dunning 2013-07-31, 23:14
Generally, ZK is much better as a coordination layer.
Starting with an expected transaction load well above the normal limits of
operation is not a grand idea.
Much better to do something simpler like have ZK coordinate shard masters
that each use conventional methods for handling transactions (see voltdb
for one approach to sharding well to allow each transaction to never span
Similarly, you can also shard and maintain version numbers, transaction
id's and an in-memory transaction table. This allows multi-shard MVCC
commit semantics but can be a bit tricky to deal with transactions stalled
by dead nodes.
Using ZK for the raw transaction stream isn't a grand idea, however.
On Wed, Jul 31, 2013 at 4:05 PM, Henry Robinson <[EMAIL PROTECTED]> wrote:
> So how about the following optimistic approach:
> 1. Read the current version of the database (stored in a znode's version
> metadata). If it is even, wait and try again; even numbers mean someone is
> committing and the DB might be in an inconsistent state. Then read the
> state from the database your update will rely upon (user1.name, in this
> instance). You must also be able to atomically read the current version
> from the database as well as zookeeper, to ensure that the data is from the
> version you think it is. If the DB version does not match the ZK version,
> 2. Once an update is ready to commit, test-and-increment the current
> version in ZK to an even number, write your update to the DB, along with
> the eventual version of the data (the next odd number).
> 3. Increment the current version in ZK to an odd number.
> The even / odd distinction means that you can detect when someone else is
> updating the database, since otherwise there's no way to do so atomically
> with an update to ZK (so another transaction can't tell if you've finished
> your update or not, and so doesn't know when to wait until).
> The problem is failure - what happens if a client fails while it's writing
> a transaction? Eventually someone can increment the transaction number, and
> if you provide an 'undo' log before you make any changes, that client can
> possibly recover from a partial commit. But at this point you need to
> understand your application's requirements in much more detail than we do
> to make recommendations.
> In particular, your storage layer may offer sufficiently powerful
> primitives such that you don't need ZK; although if it's a filesystem then
> that probably isn't true.
> On 31 July 2013 15:51, Baskar Duraikannu <[EMAIL PROTECTED]
> > We cannot always resolve conflicts ourselves. For example, let us say
> > a) user1 changed the name from 'Kathy' to Katherineb) user2 changes the
> > name from 'Kathy' to 'Kat'
> > Both read 'Kathy' as input; user1's update succeeded. If we need to let
> > user2 know that something has changed as this may result in the user not
> > changing 'Kathy' to 'Kat' (as an example).
> > Hope this explains
> > > Date: Wed, 31 Jul 2013 07:49:39 -0400
> > > Subject: Re: Zookeeper performance
> > > From: [EMAIL PROTECTED]
> > > To: [EMAIL PROTECTED]
> > >
> > > This sounds highly error prone to me regardless of whether or not
> > zookeeper
> > > can handle the load-. Why not just use a standard transaction model
> > a
> > > vector clock or other timing device to detect conflicts so you don't
> > > to worry about a second server to talk to (zookeeper) to do an update?
> > > On Jul 31, 2013 7:17 AM, "Baskar Duraikannu" <
> > [EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Hello
> > > >
> > > > We are looking to use zookeeper for optimistic concurrency. Basically
> > when
> > > > the user saves data on a screen, we need to lock, read to ensure
> > no
> > > > one else has changed the row while user is editing data, persist data
> > and
> > > > unlock znode.
> > > >
> > > > If the app/thread does not get a lock, we may set a watch so that