Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Possibility / consequences of having multiple elected leaders


Copy link to this message
-
RE: Possibility / consequences of having multiple elected leaders
> Such a commit will be rejected due to an old epoch.

Ted, can you please point me to the place in the code where this check is performed ?

Thanks a lot,
Alex

> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, March 07, 2012 10:59 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Possibility / consequences of having multiple elected
> leaders
>
> This can be emulated on Linux by simply pausing the process.
>
> The correct behavior is that the old leader will freeze and if it comes
> back relatively soon, it will still be recognized as leader.
>
> If the pause is long enough, then the other members of the quorum will
> decide that they have lost contact with the leader and initiate a new
> leader election.  That election will cause the epoch to be incremented.
>  When the old leader returns, it may attempt to commit a change.  Such
> a
> commit will be rejected due to an old epoch.  Alternately, it will get
> a
> ping or a commit from the other servers and realize that it is behind
> and
> initiate a resynchronization.  Even if the old leader had started a
> commit
> before being paused, the commit will have either succeeded in becoming
> durable or not.  Neither case will cause any discrepancies since the
> leader
> election will cause the remaining quorum to agree on a correct state.
>
> In any case, the paused server should either survive as leader with the
> assent of a quorum or it should realize it is no longer the leader and
> transparently update itself to the current state of the quorum.
>
> On Wed, Mar 7, 2012 at 9:48 AM, Scott Lindner
> <[EMAIL PROTECTED]>wrote:
>
> > ...
> > This got us to wondering what would happen if the elected leader were
> > "frozen" in this manner?  There's no guarantees where in the code it
> would
> > be hung to know for certain what would happen when it left this
> state, but
> > could there be any problems where the "frozen" server would come out
> of
> > this state still thinking it was the leader (since it was stuck) when
> in
> > fact another server had been elected in the meantime?  I would
> imagine this
> > should resolve itself fairly quickly but is there still a possibility
> that
> > this could lead to bad behavior?  Typically if a server fails I would
> > imagine the zookeeper instance would die or lose leadership because
> of an
> > event (failed connection, etc) but this seems slightly different
> since the
> > code would be blocked in a random state.
> > ...