Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Fast leader election initial delay, is that possible?


Copy link to this message
-
Re: Fast leader election initial delay, is that possible?
Sampath, Do you think something along the lines of what Ted describes  
would work for you?

-Flavio

On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:

> The thought is that a server would not complain about connection  
> refused or inability to form a quorum during the first (say) twenty  
> seconds of operation.
>
> The thesis is that warnings from these causes during that time are  
> spurious.
>
> As I mentioned, I don't see this as urgent or even necessarily a  
> good idea.  I completely reboot a ZK cluster once every year or  
> three.  When I am doing a rolling upgrade, I *want* to see alerts  
> when I bounce a machine.  If I don't want to see those alerts, my  
> monitoring system allows me to put a machine into maintenance mode  
> for a short period of time to temporarily suppress the warnings.
>
> All I was doing was translating and elaborating the original  
> poster's suggestion, not so much endorsing it.
>
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fpj@yahoo-
> inc.com> wrote:
> Hi Ted, I don't see how one can automate the distinction between a  
> machine that is down because it crashed and a machine that is down  
> because it hasn't started yet. Assuming that we are logging the  
> machine unavailability as we are doing currently, one can always  
> look at the timestamp of the warning and remember that this is the  
> time the machines were bootstrapping. Consequently, I don't really  
> see the point of reducing the number of warnings, unless the  
> warnings are really polluting the logs. I typically don't see so  
> many that prevents me from reading the rest, but you may have a  
> different perception. Also, recall that we back off, so the warnings  
> become less frequent over time.
>
> I'm open to ideas, though. If you see anything wrong in my rationale  
> or if you have an idea of how to do it differently, then I'd be  
> happy to hear. However, if the idea is simply to add a parameter  
> that configures the time for leader election to start, then I'm  
> currently not in favor.
>
> -Flavio
>
> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point  
>> that many
>> of these warnings are to be expected and there is a heuristic that  
>> might
>> assist in distinguishing some of these cases so that false alarms  
>> in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have  
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost  
>> never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning  
>> <[EMAIL PROTECTED]> wrote:
>>
>>>
>>>
>>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <[EMAIL PROTECTED]
>>> >wrote:
>>>
>>>>
>>>> Hhmmm, I think this is a bit different isn't it? Here we know  
>>>> that the
>>>> first
>>>> server to come will be failing to connect to the other as they  
>>>> are not yet
>>>> up. Anyway our real issue is the warning.
>>>>
>>>
>>> We know that.
>>>
>>> But how does the server know that it is the first server?  That is  
>>> the
>>> whole point of the leader election.  You might just have a server  
>>> rejoining
>>> a cluster.  Or you might have a cluster that has been turned off.  
>>> Or a
>>> cluster with 2 out of 5 machines off and we tried to touch the  
>>> other down
>>> machine before the others.
>>>
>>>
>>>>>
>>>>> Would you like to suggest a patch?
>>>>>
>>>>
>>>> Of course I do.. will prepare a patch and attach.
>>>>
>>>
>>> Great!
>>>
>>>
>
> flavio
> junqueira
>
> research scientist
>
> [EMAIL PROTECTED]
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>
>

flavio
junqueira

research scientist

[EMAIL PROTECTED]
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB