Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Dynamic reconfiguration

Copy link to this message
Re: Dynamic reconfiguration
Thanks Alex for the detailed explanations--  it really helps to fill in my
understanding of the implementation left open by the papers/presentations
I've read (without having to read the code yet :-) ).  #2 is what I was
unsure of, but makes perfect sense.

Obviously committing the new configuration to the internal database is a
prerequisite to committing on a server, but is writing the new *configuration
file* to disk also a prerequisite for committing the new configuration?
 I'm curious about this so I can match it with my observations, since
reading the configuration file is much easier than getting the database


On Sat, Jul 28, 2012 at 11:02 AM, Alexander Shraer <[EMAIL PROTECTED]>wrote:

> Hi Jared,
> figuring out what happened and how to recover is part of the
> reconfiguration protocol. I don't think that this is something you as a
> user should do, unless I missunderstand what you're trying to do. This
> should be handled by ZooKeeper just like it handles other failures without
> admin intervention.
> In your scenario, D-F come up and one of them is elected leader (since you
> said they know about the commit), so they start running the new config
> normally. When A-C come up, several things may happen:
> 1. During the preliminary FastLeaderElection, A-C will try to connect to D
> and E, and in fact they'll also try to connect with the new config members
> that they know was proposed. So most chances are that someone in the new
> config will send them the new config file and they'll store it and act
> accordingly (connect as non-voting followers in the new config). To make
> this happen, I changed FastLeaderElection to talk with proposed configs (if
> known) and to piggiback the last active config you know of on all messages.
> 2. Its possible that somehow A-C complete FastLeaderElection without
> talking to D-F. But since a reconfiguration was committed, it was acked by
> a quorum of the old config (and a quorum of the new one). Therefore,
> whoever is "elected" in the old config, knows about the reconfig proposal
> (this is guaranteed by normal ZooKeeper leader recovery). Before doing
> anything else, the new leader among A-C will try to complete the
> reconfiguration, which involves getting enough acks from a quorum of the
> new config. But in your scenario the servers in the new config will not
> connect to it because they moved on, so the candidate-leader will just give
> up and go back to (1) above.
> 3. In the remote chance that someone who heard about the reconfig commit
> connects to a candidate-leader who didn't hear about it, the first thing it
> does  is to tell that candidate-leader that its not up to date, and the
> leader just updates its config file, gives up on being a leader and returns
> to (1). This was done by changing the first message that a
> follower/observer sends to a leader it is connecting to, even before the
> synchronization starts.
> Alex
> On Sat, Jul 28, 2012 at 8:43 AM, Jared Cantwell  <[EMAIL PROTECTED]
> > wrote:
>> So I'm working through some failure scenarios and I want to make sure I
>> fully understand the way that dynamic membership changes previous behavior,
>> so are my expectations correct in this situation:
>> As in my previous example, lets say that the current membership of voting
>> participants is {A,B,C,D,E} and we're looking to change membership to
>> {D,E,F,G,H}.
>> 1. Reconfiguration to {D,E,F,G,H} completes internally
>> 2. D-F update their local configuration files, but A-C do not yet.
>> 3. Power loss to all nodes
>> Now what happens if A,B, and C come up with configuration files that
>> still say {A,B,C,D,E}, but no other servers start up yet?  Can A,B and C
>> form a quorum and elect a leader since they all agree on the same state?
>>  What then happens when the new membership of D-H starts up?
>> We're trying to automatically handle node failures during reconfiguration
>> situations, but it seems like without being able to query all nodes to make