Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Fast leader election initial delay, is that possible?


+
Sampath Perera 2011-08-18, 03:40
+
Ted Dunning 2011-08-18, 05:19
+
Sampath Perera 2011-08-18, 07:15
+
Ted Dunning 2011-08-18, 15:36
+
Ted Dunning 2011-08-18, 15:39
+
Flavio Junqueira 2011-08-18, 15:54
+
Sampath Perera 2011-08-18, 16:54
+
Sampath Perera 2011-08-18, 16:55
+
Ted Dunning 2011-08-18, 17:13
Copy link to this message
-
Re: Fast leader election initial delay, is that possible?
Sampath, Do you think something along the lines of what Ted describes  
would work for you?

-Flavio

On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:

> The thought is that a server would not complain about connection  
> refused or inability to form a quorum during the first (say) twenty  
> seconds of operation.
>
> The thesis is that warnings from these causes during that time are  
> spurious.
>
> As I mentioned, I don't see this as urgent or even necessarily a  
> good idea.  I completely reboot a ZK cluster once every year or  
> three.  When I am doing a rolling upgrade, I *want* to see alerts  
> when I bounce a machine.  If I don't want to see those alerts, my  
> monitoring system allows me to put a machine into maintenance mode  
> for a short period of time to temporarily suppress the warnings.
>
> All I was doing was translating and elaborating the original  
> poster's suggestion, not so much endorsing it.
>
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fpj@yahoo-
> inc.com> wrote:
> Hi Ted, I don't see how one can automate the distinction between a  
> machine that is down because it crashed and a machine that is down  
> because it hasn't started yet. Assuming that we are logging the  
> machine unavailability as we are doing currently, one can always  
> look at the timestamp of the warning and remember that this is the  
> time the machines were bootstrapping. Consequently, I don't really  
> see the point of reducing the number of warnings, unless the  
> warnings are really polluting the logs. I typically don't see so  
> many that prevents me from reading the rest, but you may have a  
> different perception. Also, recall that we back off, so the warnings  
> become less frequent over time.
>
> I'm open to ideas, though. If you see anything wrong in my rationale  
> or if you have an idea of how to do it differently, then I'd be  
> happy to hear. However, if the idea is simply to add a parameter  
> that configures the time for leader election to start, then I'm  
> currently not in favor.
>
> -Flavio
>
> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point  
>> that many
>> of these warnings are to be expected and there is a heuristic that  
>> might
>> assist in distinguishing some of these cases so that false alarms  
>> in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have  
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost  
>> never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning  
>> <[EMAIL PROTECTED]> wrote:
>>
>>>
>>>
>>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <[EMAIL PROTECTED]
>>> >wrote:
>>>
>>>>
>>>> Hhmmm, I think this is a bit different isn't it? Here we know  
>>>> that the
>>>> first
>>>> server to come will be failing to connect to the other as they  
>>>> are not yet
>>>> up. Anyway our real issue is the warning.
>>>>
>>>
>>> We know that.
>>>
>>> But how does the server know that it is the first server?  That is  
>>> the
>>> whole point of the leader election.  You might just have a server  
>>> rejoining
>>> a cluster.  Or you might have a cluster that has been turned off.  
>>> Or a
>>> cluster with 2 out of 5 machines off and we tried to touch the  
>>> other down
>>> machine before the others.
>>>
>>>
>>>>>
>>>>> Would you like to suggest a patch?
>>>>>
>>>>
>>>> Of course I do.. will prepare a patch and attach.
>>>>
>>>
>>> Great!
>>>
>>>
>
> flavio
> junqueira
>
> research scientist
>
> [EMAIL PROTECTED]
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>
>

flavio
junqueira

research scientist

[EMAIL PROTECTED]
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

+
Vishal Kher 2011-08-19, 20:13
+
Sampath Perera 2011-08-20, 02:30
+
Sampath Perera 2011-08-20, 02:23
+
Flavio Junqueira 2011-08-18, 09:13