michael.boom 2013-11-19, 10:12
michael.boom 2013-11-20, 09:46
German Blanco 2013-11-20, 11:24
michael.boom 2013-11-20, 13:24
Rakesh R 2013-11-20, 13:45
michael.boom 2013-11-20, 14:17
Rakesh R 2013-11-20, 14:30
michael.boom 2013-11-20, 15:07
Ted Dunning 2013-11-20, 15:19
michael.boom 2013-11-20, 15:32
Bryan Thompson 2013-11-20, 15:35
michael.boom 2013-11-20, 17:11
Rakesh R 2013-11-21, 05:25
michael.boom 2013-11-21, 10:20
I am interested in this procedure, but I have never attempted it myself.
It seems that the concept advanced in  is to manually replicate the
data, then start a new ensemble with different ports using the replicated
data, and finally instruct your clients to talk to the new ensemble. This
procedure would definitely cause any ephemeral znodes to be lost during
the migration since the client connections could not be transferred to the
new ensemble without being dropped. Essentially, each client becomes
disconnected from the standalone server instance and then is reconnected
to the new ensemble of highly available server instances.
Given that the clients must become disconnected, at least temporarily, it
seems that you can not obtain 100% up time from a client perspective
during this migration. I.e., each client would have to be either
restarted or (if you architect for it in your client) it would have to be
instructed through an API that it should disconnect from one zk server
configuration and connect to another. Either way, the client would be
disconnected during its transition.
However, the service provided by your clients could remain up as long as
that service was able to move transparently from clients connected to the
standalone node to clients connected to the ensemble.
But at the service level, there would have to be some point at which you
stopped relying on the data in the standalone instance and began to rely
on the data in the new ensemble. If there are writes on the standalone
instance after you manually replicate its data, then those writes would
not be present in the ensemble. From a service level, those writes would
have been lost.
I would be interested in a procedure to make this migration seamless, but
I can't see how it would be accomplished without:
- halt writes on zookeeper.
- replicate zookeeper standalone server state to a zookeeper ensemble with
at least two instances (a quorum can meet with two servers). The services
will need their myid files. If you start one of these servers on the same
machine, then you need to use a different client port for the new ensemble.
- start the servers in the new ensemble. Quorum should meet. Leader should
be elected, etc.
- change the client configuration to point to the servers in the new
- restart the clients. This moves them from the old standalone zookeeper
instance (which nobody should be writing on) to the new ensemble (which is
- terminate the old standalone zookeeper server instance
I think that a procedure to increase the replication count of a zookeeper
ensemble would be similar:
- start a new service in zookeeper ensemble. This service should know
about the original servers plus itself.
- for each existing zookeeper service, change the server configurations to
include the new server and restart the service (rolling restart). This
makes the services mutually aware of the new server.
- for each client, change the client configuration to include the new
zookeeper ensemble list and restart that client.
Given all of this, I suggest that the right way to move from a single node
deployment to a highly available deployment is to begin with a zookeeper
ensemble running on the initial node.
- Begin with a single node with 3 zookeeper server instances configured as
an ensemble (there are instructions somewhere for running multiple zk
instances on the same node - the ports need to be specified such that they
do not conflict).
To move from a single node to multiple nodes:
- Configure and start a new zookeeper server instance on another node. It
should know about 2 of the original instances.
- Rolling reconfigure and restart of the zookeeper services. The server
instance that is being migrated is terminated rather than being restarted.
- Rolling reconfigure and restart of the zookeeper clients. On restart,
the client will know about the new zookeeper ensemble locations.
This would leave you with two zookeeper server instances on the original
node and one somewhere else.
You would then repeat that procedure to migrate one of the two remaining
zookeeper server instances to another node. That would give you one
zookeeper service per node.
You could then follow the procedure to increase the replication count if
you wanted to increase the availability of zookeeper beyond those three
I have not tested any of this. This is just the way I could see it
working based on my understanding of zookeeper. I am interested in a
procedure for managing this because we have a service that uses zookeeper
to coordinate failover. We can manage the increase of replication in our
own services and their durable state easily enough, but I am not sure how
to manage this for zookeeper. All of the above is complicated enough that
it seems it would be easier to begin with three VMs running zookeeper and
then migrate the VMs if necessary, ideally without changing their IPs.
On 11/21/13 5:20 AM, "michael.boom" <[EMAIL PROTECTED]> wrote: