-RE: Possibility / consequences of having multiple elected leaders
Alexander Shraer 2012-03-08, 00:07
> Such a commit will be rejected due to an old epoch.
Ted, can you please point me to the place in the code where this check is performed ?
Thanks a lot,
> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, March 07, 2012 10:59 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Possibility / consequences of having multiple elected
> This can be emulated on Linux by simply pausing the process.
> The correct behavior is that the old leader will freeze and if it comes
> back relatively soon, it will still be recognized as leader.
> If the pause is long enough, then the other members of the quorum will
> decide that they have lost contact with the leader and initiate a new
> leader election. That election will cause the epoch to be incremented.
> When the old leader returns, it may attempt to commit a change. Such
> commit will be rejected due to an old epoch. Alternately, it will get
> ping or a commit from the other servers and realize that it is behind
> initiate a resynchronization. Even if the old leader had started a
> before being paused, the commit will have either succeeded in becoming
> durable or not. Neither case will cause any discrepancies since the
> election will cause the remaining quorum to agree on a correct state.
> In any case, the paused server should either survive as leader with the
> assent of a quorum or it should realize it is no longer the leader and
> transparently update itself to the current state of the quorum.
> On Wed, Mar 7, 2012 at 9:48 AM, Scott Lindner
> <[EMAIL PROTECTED]>wrote:
> > ...
> > This got us to wondering what would happen if the elected leader were
> > "frozen" in this manner? There's no guarantees where in the code it
> > be hung to know for certain what would happen when it left this
> state, but
> > could there be any problems where the "frozen" server would come out
> > this state still thinking it was the leader (since it was stuck) when
> > fact another server had been elected in the meantime? I would
> imagine this
> > should resolve itself fairly quickly but is there still a possibility
> > this could lead to bad behavior? Typically if a server fails I would
> > imagine the zookeeper instance would die or lose leadership because
> of an
> > event (failed connection, etc) but this seems slightly different
> since the
> > code would be blocked in a random state.
> > ...