Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - Performing no downtime hardware changes to a live zookeeper cluster


+
Neha Narkhede 2011-12-20, 20:14
+
Camille Fournier 2011-12-20, 20:26
+
Ted Dunning 2011-12-20, 21:06
+
Neha Narkhede 2011-12-22, 01:43
+
Camille Fournier 2011-12-22, 03:21
+
Neha Narkhede 2012-01-09, 18:15
+
Camille Fournier 2012-01-09, 18:31
+
Neha Narkhede 2012-01-09, 18:51
+
Camille Fournier 2012-01-09, 19:04
+
Neha Narkhede 2012-01-09, 20:33
Copy link to this message
-
Re: Performing no downtime hardware changes to a live zookeeper cluster
Camille Fournier 2012-01-09, 20:47
Sounds fine with me, probably should make it a flaggable option.

C
On Mon, Jan 9, 2012 at 3:33 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> >> If you just have machine names in a list that you pass in, then yes, we
> could re-resolve on every reconnect and you could just re-alias that name
> to a new IP. But you'll have to put in logic that will do that but not
> break people using DNS RR.
>
> Having a list of machine names that can be changed to point to new IPs
> seems reasonable too. To be able to do the upgrade without having to
> restart all clients, besides turning off DNS caching in the JVM, we
> still have to solve the problem of zookeeper client caching the IPs in
> code. Having 2 levels of DNS caching, one in the JVM and one in code
> (which cannot be turned off) doesn't look like a good idea. Unless I'm
> missing the purpose of such IP caching in zookeeper ?
>
> >> I realize that moving machines is difficult when you have lots of
> clients.
> I'm a bit surprised your admins can't maintain machine IP addresses on a
> machine move given a cluster of that complexity, though
>
> Its not like it can't be done, it definitely has quite some
> operational overhead. We are trying to brainstorm various approaches
> and come up with one that will involve the least overhead on such
> upgrades going forward.
>
> Having said that, seems like re-resolving host names in reconnect
> doesn't look like a bad idea, provided it doesn't break the DNS RR use
> case. If that sounds good, can I go ahead a file a JIRA for this ?
>
> Thanks,
> Neha
>
> On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier <[EMAIL PROTECTED]>
> wrote:
> > We don't shuffle IPs after the initial resolution of IP addresses.
> >
> > In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
> > robin through them trying to connect. If you re-resolve on every
> > round-robin, you have to put in logic to know which ones have changed and
> > somehow maintain that shuffle order or you aren't doing a fair back end
> > round robin, which people using the ZK client against DNS RR are relying
> on
> > today.
> >
> > If you just have machine names in a list that you pass in, then yes, we
> > could re-resolve on every reconnect and you could just re-alias that name
> > to a new IP. But you'll have to put in logic that will do that but not
> > break people using DNS RR.
> >
> > I realize that moving machines is difficult when you have lots of
> clients.
> > I'm a bit surprised your admins can't maintain machine IP addresses on a
> > machine move given a cluster of that complexity, though. I also think
> that
> > if we're going to be putting special cases like this in we might just
> want
> > to go all the way to a pluggable reconnection scheme, but maybe that is
> too
> > aggressive.
> >
> > C
> >
> > On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
> >
> >> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
> >> simplest implementation which resolves a hostname to multiple IPs.
> >>
> >> Whatever method you use to map host names to IPs, the problem is that
> >> the zookeeper client code will always cache the IPs. So to be able to
> >> swap out a machine, all clients would have to be restarted, which if
> >> you have 100s of clients, is a major pain. If you want to move the
> >> entire cluster to new machines, this becomes even harder.
> >>
> >> I don't see why re-resolving host names to IPs in the reconnect logic
> >> is a problem for zookeeper, since you shuffle the list of IPs anyways.
> >>
> >> Thanks,
> >> Neha
> >>
> >>
> >> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <[EMAIL PROTECTED]>
> >> wrote:
> >> > You can't sensibly round robin within the client code if you
> re-resolve
> >> on
> >> > every reconnect, if you're using dns rr. If that's your goal you'd
> want a
> >> > list of dns alias names and re-resolve each hostname when you hit it
> on
> >> > reconnect. But that will break people using dns rr.
+
Alexander Shraer 2012-01-09, 22:23
+
Ted Dunning 2012-01-09, 23:17
+
Patrick Hunt 2012-01-10, 01:36
+
Neha Narkhede 2012-01-10, 02:49
+
Patrick Hunt 2012-01-10, 16:39