Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Fast leader election initial delay, is that possible?


Copy link to this message
-
Re: Fast leader election initial delay, is that possible?
Vishal Kher 2011-08-19, 20:13
My few cents..
I am not sure if we can distinguish between spurious/non-spurious warnings
and I don't think we can time it well. The delay is applicable only in
certain cases. If the user knows that there will be a start up delay, then
the user can ignore those errors or modify their scripts to start the server
after a delay. Does this have to implemented in the server? I sounds me that
this is something that user scripts should handle.
On Fri, Aug 19, 2011 at 7:00 AM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:

> Sampath, Do you think something along the lines of what Ted describes would
> work for you?
>
> -Flavio
>
> On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:
>
> The thought is that a server would not complain about connection refused or
> inability to form a quorum during the first (say) twenty seconds of
> operation.
>
> The thesis is that warnings from these causes during that time are
> spurious.
>
> As I mentioned, I don't see this as urgent or even necessarily a good idea.
>  I completely reboot a ZK cluster once every year or three.  When I am doing
> a rolling upgrade, I *want* to see alerts when I bounce a machine.  If I
> don't want to see those alerts, my monitoring system allows me to put a
> machine into maintenance mode for a short period of time to temporarily
> suppress the warnings.
>
> All I was doing was translating and elaborating the original poster's
> suggestion, not so much endorsing it.
>
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <[EMAIL PROTECTED]>wrote:
>
>> Hi Ted, I don't see how one can automate the distinction between a machine
>> that is down because it crashed and a machine that is down because it hasn't
>> started yet. Assuming that we are logging the machine unavailability as we
>> are doing currently, one can always look at the timestamp of the warning and
>> remember that this is the time the machines were bootstrapping.
>> Consequently, I don't really see the point of reducing the number of
>> warnings, unless the warnings are really polluting the logs. I typically
>> don't see so many that prevents me from reading the rest, but you may have a
>> different perception. Also, recall that we back off, so the warnings become
>> less frequent over time.
>>
>> I'm open to ideas, though. If you see anything wrong in my rationale or if
>> you have an idea of how to do it differently, then I'd be happy to hear.
>> However, if the idea is simply to add a parameter that configures the time
>> for leader election to start, then I'm currently not in favor.
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point that
>> many
>> of these warnings are to be expected and there is a heuristic that might
>> assist in distinguishing some of these cases so that false alarms in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <[EMAIL PROTECTED]>
>> wrote:
>>
>>
>>
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <[EMAIL PROTECTED]
>> >wrote:
>>
>>
>>
>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>
>> first
>>
>> server to come will be failing to connect to the other as they are not yet
>>
>> up. Anyway our real issue is the warning.
>>
>>
>>
>> We know that.
>>
>>
>> But how does the server know that it is the first server?  That is the
>>
>> whole point of the leader election.  You might just have a server
>> rejoining
>>
>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>
>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>
>> machine before the others.
>>
>>
>>
>>
>> Would you like to suggest a patch?
>>
>>
>>
>> Of course I do.. will prepare a patch and attach.
>>
>>
>>
>> Great!
>>
>>
>>
>>