Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - Performing no downtime hardware changes to a live zookeeper cluster


+
Neha Narkhede 2011-12-20, 20:14
+
Camille Fournier 2011-12-20, 20:26
+
Ted Dunning 2011-12-20, 21:06
+
Neha Narkhede 2011-12-22, 01:43
+
Camille Fournier 2011-12-22, 03:21
+
Neha Narkhede 2012-01-09, 18:15
+
Camille Fournier 2012-01-09, 18:31
+
Neha Narkhede 2012-01-09, 18:51
+
Camille Fournier 2012-01-09, 19:04
+
Neha Narkhede 2012-01-09, 20:33
+
Camille Fournier 2012-01-09, 20:47
+
Alexander Shraer 2012-01-09, 22:23
+
Ted Dunning 2012-01-09, 23:17
+
Patrick Hunt 2012-01-10, 01:36
Copy link to this message
-
Re: Performing no downtime hardware changes to a live zookeeper cluster
Neha Narkhede 2012-01-10, 02:49
Patrick,

Looks like https://issues.apache.org/jira/browse/ZOOKEEPER-1356 is a
duplicate of 338 ? If yes, then I'll mark it to reflect the same.

Thanks,
Neha

On Mon, Jan 9, 2012 at 5:36 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> dup of https://issues.apache.org/jira/browse/ZOOKEEPER-338 ?
>
> Patrick
>
> On Mon, Jan 9, 2012 at 3:17 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>> Neha
>>
>> Filing a jira is a great way to further the discussion.
>>
>> Sent from my iPhone
>>
>> On Jan 9, 2012, at 15:33, Neha Narkhede <[EMAIL PROTECTED]> wrote:
>>
>>>>> If you just have machine names in a list that you pass in, then yes, we
>>> could re-resolve on every reconnect and you could just re-alias that name
>>> to a new IP. But you'll have to put in logic that will do that but not
>>> break people using DNS RR.
>>>
>>> Having a list of machine names that can be changed to point to new IPs
>>> seems reasonable too. To be able to do the upgrade without having to
>>> restart all clients, besides turning off DNS caching in the JVM, we
>>> still have to solve the problem of zookeeper client caching the IPs in
>>> code. Having 2 levels of DNS caching, one in the JVM and one in code
>>> (which cannot be turned off) doesn't look like a good idea. Unless I'm
>>> missing the purpose of such IP caching in zookeeper ?
>>>
>>>>> I realize that moving machines is difficult when you have lots of clients.
>>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>>> machine move given a cluster of that complexity, though
>>>
>>> Its not like it can't be done, it definitely has quite some
>>> operational overhead. We are trying to brainstorm various approaches
>>> and come up with one that will involve the least overhead on such
>>> upgrades going forward.
>>>
>>> Having said that, seems like re-resolving host names in reconnect
>>> doesn't look like a bad idea, provided it doesn't break the DNS RR use
>>> case. If that sounds good, can I go ahead a file a JIRA for this ?
>>>
>>> Thanks,
>>> Neha
>>>
>>> On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier <[EMAIL PROTECTED]> wrote:
>>>> We don't shuffle IPs after the initial resolution of IP addresses.
>>>>
>>>> In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
>>>> robin through them trying to connect. If you re-resolve on every
>>>> round-robin, you have to put in logic to know which ones have changed and
>>>> somehow maintain that shuffle order or you aren't doing a fair back end
>>>> round robin, which people using the ZK client against DNS RR are relying on
>>>> today.
>>>>
>>>> If you just have machine names in a list that you pass in, then yes, we
>>>> could re-resolve on every reconnect and you could just re-alias that name
>>>> to a new IP. But you'll have to put in logic that will do that but not
>>>> break people using DNS RR.
>>>>
>>>> I realize that moving machines is difficult when you have lots of clients.
>>>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>>>> machine move given a cluster of that complexity, though. I also think that
>>>> if we're going to be putting special cases like this in we might just want
>>>> to go all the way to a pluggable reconnection scheme, but maybe that is too
>>>> aggressive.
>>>>
>>>> C
>>>>
>>>> On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
>>>>> simplest implementation which resolves a hostname to multiple IPs.
>>>>>
>>>>> Whatever method you use to map host names to IPs, the problem is that
>>>>> the zookeeper client code will always cache the IPs. So to be able to
>>>>> swap out a machine, all clients would have to be restarted, which if
>>>>> you have 100s of clients, is a major pain. If you want to move the
>>>>> entire cluster to new machines, this becomes even harder.
>>>>>
>>>>> I don't see why re-resolving host names to IPs in the reconnect logic
>>>>> is a problem for zookeeper, since you shuffle the list of IPs anyways.
+
Patrick Hunt 2012-01-10, 16:39