Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Zookeeper on short lived VMs and ZOOKEEPER-107


Copy link to this message
-
Re: Zookeeper on short lived VMs and ZOOKEEPER-107
wrt ps2 the rule is that bug fix releases are only for bugs. so 3.4.X
shouldn't have more features than 3.4.0. so this really is something
for 3.5.0. it would be nice to have shorter release cycles. 3.4.0 was
released in november, so we should be doing a 3.5.0 release some time
relatively soon.

ben

On Fri, Mar 16, 2012 at 2:56 AM, Christian Ziech
<[EMAIL PROTECTED]> wrote:
> Under normal circumstances the ability to detect failures correctly should
> be given. The scenario I'm aware of includes one zookeeper system would be
> taken down for a reason and then possibly just rebooted or even started from
> scratch elsewhere. In both cases however the new host would have the old dns
> name but most likely a different IP. But of course that only applies to us
> and possibly not to all of the users.
>
> When thinking about the scenario you described I understood where the
> problem lies. However wouldn't the same problem also be relevant the way
> zookeeper is implemented right now? Let me try to explain why (possibly I'm
> wrong here since I may miss some points on how zookeeper servers works
> internally - corrections are very welcome):
> - Same scenarios as you described - nodes A with host name a, B host name b
> and C with host name c
> - Also same as in your scenario C is due to some human error falsely
> detected as down. Hence C' is brought up and is assigned the same DNS name
> as C
> - Now rolling restarts are performed to bring in C'
> - A resolves c correctly to the new IP and connects to C' but B still
> resolves the host name c to the original address of C and hence does not
> connect (I think some DNS slowness is also required for your approach in
> order for the host name c being resolved to the original IP of C)
> - now the rest of your scenario happens: Update U is applied, C' gets slow,
> C recovers and A fails.
> Of course also this approach requires some DNS craziness but if I did not
> make a mistake in my thoughts it should still be possible.
>
> PS: Wouldn't your scenario not also invalidate the solution of the hbase
> guys using amazons elastic ips to solve the same problem (see
> https://issues.apache.org/jira/browse/HBASE-2327)?
> PS2: If the approach I had in mind is not valid, do you guys already have a
> plan for when 3.5.0 would be released or could you guys be supported in some
> way so that zookeeper-107 makes it sooner into a release?
>
> Am 16.03.2012 04:43, schrieb ext Alexander Shraer:
>>
>> Actually its still not clear to me how you would enforce the 2x+1. In
>> Zookeeper we can guarantee liveness (progress) only when x+1 are connected
>> and up, however safety (correctness) is always guaranteed, even if 2 out of
>> 3 servers are temporarily down. Your design needs the 2x+1 for safety, which
>> I think is problematic unless you can accurately detect failures (synchrony)
>> and failures are permanent.
>>
>> Alex
>>
>>
>> On Mar 15, 2012, at 3:54 PM, Alexander Shraer<[EMAIL PROTECTED]>  wrote:
>>
>>> I think the concern is that the old VM can recover and try to
>>> reconnect. Theoretically you could even go back and forth between new
>>> and old VM. For example, suppose that you have servers
>>> A, B and C in the cluster, A is the leader. C is slow and "replaced"
>>> with C', then update U is acked by A and C', then A fails. In this
>>> situation you cannot have additional failures. But with the
>>> automatic replacement thing it can (theoretically) happen that C'
>>> becomes a little slow, C connects to B and is chosen as leader, and
>>> the committed update U is lost forever. This is perhaps unlikely but
>>> possible...
>>>
>>> Alex
>>>
>>> On Thu, Mar 15, 2012 at 1:35 PM,<[EMAIL PROTECTED]>  wrote:
>>>>
>>>> I agree with your points about any kind of VMs having a hard to predict
>>>> runtime behaviour and that participants of the zookeeper ensemble running on
>>>> a VM could fail to send keep-alives for an uncertain amount of time. But I
>>>> don't yet understand how that would break the approach I was mentioning: