Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - How to join quorum without restarting existing servers


Copy link to this message
-
Re: How to join quorum without restarting existing servers
German Blanco 2013-11-07, 04:34
Hello again,

I don't think it is a good a idea to start a new thread with the same
issue. Please continue in the latest thread.

could this be a DNS resolution caching problem?
See https://issues.apache.org/jira/browse/ZOOKEEPER-1506

The new server has the lowest sid. It is able to connect to all other
servers, but the rest of the servers don't seem able to connect to it.
Connections from this server to the rest are useless, since they are
dropped because of the sid comparison that you see in the log.

You could try to change the server address in the configuration for the AWS
public IP address of the peers, just to test if that works ok. Or try
replacing the server with the highest sid, that should also work. Otherwise
(assuming the problem is DNS resolution), the only current workaround that
I can think of is the rolling restart, as you have noticed.
On Wed, Nov 6, 2013 at 6:39 PM, Diego Oliveira <[EMAIL PROTECTED]> wrote:

> Bae,
>
>    Just a note, when using Zookeeper in amazon AWS, the instance IP
> relocation at restart is a nightmare. One solution is to do as you sad,
> using an elastic IP, but the max number 5 is limiting. One option is to
> configure a VPC. I got this problems last year.
>
> Att,
>       Diego.
>
>
> On Tue, Nov 5, 2013 at 4:18 PM, Bae, Jae Hyeon <[EMAIL PROTECTED]> wrote:
>
> > I am attaching log file. Could you take a look why the new instance
> cannot
> > join quorum?
> >
> >
> > On Tue, Nov 5, 2013 at 9:52 AM, Bae, Jae Hyeon <[EMAIL PROTECTED]>
> wrote:
> >
> >> Thanks a lot Ben
> >>
> >> We are also using zookeeper in AWS with elastic IP. Why I asked this
> >> question is, when the bad Zookeeper EC2 instance is terminated and new
> >> instance is launched with the previous elastic IP, it cannot join quorum
> >> without any specific error messages. But when I did rolling restart, the
> >> new instance started normally, synchronized and joined quorum.
> >>
> >> As I understand German's response, the new instance should start,
> >> synchronize, and join quorum successfully without any impact on existing
> >> instances but it didn't. I will investigate further.
> >>
> >> Thank you
> >> Best, Jae
> >>
> >>
> >> On Tue, Nov 5, 2013 at 8:24 AM, Ben Hall <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi Jae,
> >>>
> >>> I wrote that article several years ago. (tbh - I hope it is not totally
> >>> out of date by now).  I agree with German's points.
> >>>
> >>> The issue it was solving was to replace a bad server without having to
> >>> shutdown the ensemble and without having to update the config files on
> >>> each server. I would also add that this only works as long as the
> server
> >>> names and ports are the same - iirc at the time the article was written
> >>> we
> >>> were using servers in AWS and referencing them either by assigned
> >>> hostnames such as zookeeper-[01|11] or by elastic IP's that could be
> >>> moved
> >>> from server to server.
> >>>
> >>> If I understand your question correctly, if you are "adding a new
> server"
> >>> such as going from 7 to 9 servers, then this approach won't benefit you
> >>> as
> >>> you.
> >>>
> >>> We also used this approach when we would upgrade the servers, but like
> >>> German said we did it one server at a time so that the Leader election
> >>> could be natural.  This allowed us to upgrade a pool of 11 servers who
> >>> were responsible for many thousands of client connections without any
> >>> down
> >>> time.
> >>>
> >>> Thanks
> >>> Ben
> >>>
> >>>
> >>> On 11/5/13 6:51 AM, "German Blanco" <[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>> >... and make sure that there is no rubbish in the data dir of the new
> >>> >server.
> >>> >
> >>> >
> >>> >On Tue, Nov 5, 2013 at 3:49 PM, German Blanco <
> >>> >[EMAIL PROTECTED]> wrote:
> >>> >
> >>> >> Hello Jae,
> >>> >>
> >>> >> I think that the answer to your question is "no, there is no benefit
> >>> in
> >>> >>a
> >>> >> rolling restart in that case".
> >>> >> If you remove a machine that was hosting a zookeeper server that was