-AW: Zookeeper on short lived VMs and ZOOKEEPER-107
christian.ziech@... 2012-03-15, 17:45
Oh sorry there is a slight misunderstanding. With VM I did not mean the java vm but the Linux vm that contains the zookeeper node. We get notified if that goes away and is repurposed.
Gesendet von meinem Nokia Lumia 800
Von: ext Alexander Shraer
Gesendet: 15.03.2012 16:33
An: [EMAIL PROTECTED]; Ziech Christian (Nokia-LC/Berlin)
Betreff: Re: Zookeeper on short lived VMs and ZOOKEEPER-107
yes, by replacing x at a time from 2x+1 you have quorum intersection.
i have one more question - zookeeper itself doesn't assume perfect
failure detection, which your scheme requires. what if the VM didn't
actually fail but just slow and then tries to reconnect ?
On Thu, Mar 15, 2012 at 2:50 AM, Christian Ziech
<[EMAIL PROTECTED]> wrote:
> I don't think that we could be running into a split brain problem in our use
> Let me try to describe the scenario we are worried about (assuming an
> ensemble of 5 nodes A,B,C,D,E):
> - The ensemble is up and running and in sync
> - Node A with the host name "zookeeperA.whatever-domain.priv" goes down
> because the VM has gone away
> - That removal of the VM is detected and a new VM is spawned with the same
> host name "zookeeperA.whatever-domain.priv" - let's call that node A'
> - Node A' zookeeper wants to join the cluster - right now this gets rejected
> by the others since A' has a different IP address than A (and the old one is
> "cached" in the InetSocketAddress of the QuorumPeer instance)
> We could ensure that at any given time there is only at most one node with
> host name "zookeeperA.whatever-domain.priv" known by the ensemble and that
> once one node is replaced, it would not come back. Also we could make sure
> that our ensemble is big enough to compensate for a replacement of more than
> x nodes at a time (setting it to x*2 + 1 nodes).
> So if I did not misestimate our problem it should be (due to the
> restrictions) simpler than the problem to be solved by zookeeper-107. My
> intention is basically by solving this smaller discrete problem to not need
> to wait for that zookeeper-107 makes it into a release (the assumption is
> that a smaller fix has a possibly a chance to make it into the 3.4.x branch
> Am 15.03.2012 07:46, schrieb ext Alexander Shraer:
>> Hi Christian,
>> ZK-107 would indeed allow you to add/remove servers and change their
>> > We could ensure that we always have a more or less fixed quorum of
>> > zookeeper servers with a fixed set of host names.
>> You should probably also ensure that a majority of the old ensemble
>> intersects with a majority of the new one.
>> Otherwise you have to run a reconfiguration protocol similarly to ZK-107.
>> For example, if you have 3 servers A B and C, and now you're adding D and E
>> that replace B and C, how would this work ? it is probable that D and E
>> don't have the latest state (as you mention) and A is down or doesn't have
>> the latest state too (a minority might not have the latest state). Also, how
>> do you prevent split brain in this case ? meaning B and C thinking that they
>> are still operational ? perhaps I'm missing something but I suspect that the
>> change you propose won't be enough...
>> Best Regards,
>> On Wed, Mar 14, 2012 at 10:01 AM, Christian Ziech
>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>> Just a small addition: In my opinion the patch could really boil
>> down to add a
>> quorumServer.electionAddr = new
>> in the catch(IOException e) clause of the connectOne() method of
>> the QuorumCnxManager. In addition on should perhaps make the
>> electionAddr field in the QuorumPeer.QuorumServer class volatile
>> to prevent races.
>> I haven't checked this change yet fully for implications but doing
>> a quick test on some machines at least showed it would solve our