Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Fast leader election initial delay, is that possible?


+
Sampath Perera 2011-08-18, 03:40
+
Ted Dunning 2011-08-18, 05:19
+
Sampath Perera 2011-08-18, 07:15
+
Ted Dunning 2011-08-18, 15:36
+
Ted Dunning 2011-08-18, 15:39
+
Flavio Junqueira 2011-08-18, 15:54
+
Sampath Perera 2011-08-18, 16:54
+
Sampath Perera 2011-08-18, 16:55
+
Ted Dunning 2011-08-18, 17:13
+
Flavio Junqueira 2011-08-19, 11:00
Copy link to this message
-
Re: Fast leader election initial delay, is that possible?
My few cents..
I am not sure if we can distinguish between spurious/non-spurious warnings
and I don't think we can time it well. The delay is applicable only in
certain cases. If the user knows that there will be a start up delay, then
the user can ignore those errors or modify their scripts to start the server
after a delay. Does this have to implemented in the server? I sounds me that
this is something that user scripts should handle.
On Fri, Aug 19, 2011 at 7:00 AM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:

> Sampath, Do you think something along the lines of what Ted describes would
> work for you?
>
> -Flavio
>
> On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:
>
> The thought is that a server would not complain about connection refused or
> inability to form a quorum during the first (say) twenty seconds of
> operation.
>
> The thesis is that warnings from these causes during that time are
> spurious.
>
> As I mentioned, I don't see this as urgent or even necessarily a good idea.
>  I completely reboot a ZK cluster once every year or three.  When I am doing
> a rolling upgrade, I *want* to see alerts when I bounce a machine.  If I
> don't want to see those alerts, my monitoring system allows me to put a
> machine into maintenance mode for a short period of time to temporarily
> suppress the warnings.
>
> All I was doing was translating and elaborating the original poster's
> suggestion, not so much endorsing it.
>
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <[EMAIL PROTECTED]>wrote:
>
>> Hi Ted, I don't see how one can automate the distinction between a machine
>> that is down because it crashed and a machine that is down because it hasn't
>> started yet. Assuming that we are logging the machine unavailability as we
>> are doing currently, one can always look at the timestamp of the warning and
>> remember that this is the time the machines were bootstrapping.
>> Consequently, I don't really see the point of reducing the number of
>> warnings, unless the warnings are really polluting the logs. I typically
>> don't see so many that prevents me from reading the rest, but you may have a
>> different perception. Also, recall that we back off, so the warnings become
>> less frequent over time.
>>
>> I'm open to ideas, though. If you see anything wrong in my rationale or if
>> you have an idea of how to do it differently, then I'd be happy to hear.
>> However, if the idea is simply to add a parameter that configures the time
>> for leader election to start, then I'm currently not in favor.
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point that
>> many
>> of these warnings are to be expected and there is a heuristic that might
>> assist in distinguishing some of these cases so that false alarms in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <[EMAIL PROTECTED]>
>> wrote:
>>
>>
>>
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <[EMAIL PROTECTED]
>> >wrote:
>>
>>
>>
>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>
>> first
>>
>> server to come will be failing to connect to the other as they are not yet
>>
>> up. Anyway our real issue is the warning.
>>
>>
>>
>> We know that.
>>
>>
>> But how does the server know that it is the first server?  That is the
>>
>> whole point of the leader election.  You might just have a server
>> rejoining
>>
>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>
>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>
>> machine before the others.
>>
>>
>>
>>
>> Would you like to suggest a patch?
>>
>>
>>
>> Of course I do.. will prepare a patch and attach.
>>
>>
>>
>> Great!
>>
>>
>>
>>
+
Sampath Perera 2011-08-20, 02:30
+
Sampath Perera 2011-08-20, 02:23
+
Flavio Junqueira 2011-08-18, 09:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB