Sampath Perera 2011-08-18, 03:40
Ted Dunning 2011-08-18, 05:19
Sampath Perera 2011-08-18, 07:15
Ted Dunning 2011-08-18, 15:36
Ted Dunning 2011-08-18, 15:39
Flavio Junqueira 2011-08-18, 15:54
Sampath Perera 2011-08-18, 16:54
Sampath Perera 2011-08-18, 16:55
Ted Dunning 2011-08-18, 17:13
Sampath, Do you think something along the lines of what Ted describes
would work for you?
On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:
> The thought is that a server would not complain about connection
> refused or inability to form a quorum during the first (say) twenty
> seconds of operation.
> The thesis is that warnings from these causes during that time are
> As I mentioned, I don't see this as urgent or even necessarily a
> good idea. I completely reboot a ZK cluster once every year or
> three. When I am doing a rolling upgrade, I *want* to see alerts
> when I bounce a machine. If I don't want to see those alerts, my
> monitoring system allows me to put a machine into maintenance mode
> for a short period of time to temporarily suppress the warnings.
> All I was doing was translating and elaborating the original
> poster's suggestion, not so much endorsing it.
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fpj@yahoo-
> inc.com> wrote:
> Hi Ted, I don't see how one can automate the distinction between a
> machine that is down because it crashed and a machine that is down
> because it hasn't started yet. Assuming that we are logging the
> machine unavailability as we are doing currently, one can always
> look at the timestamp of the warning and remember that this is the
> time the machines were bootstrapping. Consequently, I don't really
> see the point of reducing the number of warnings, unless the
> warnings are really polluting the logs. I typically don't see so
> many that prevents me from reading the rest, but you may have a
> different perception. Also, recall that we back off, so the warnings
> become less frequent over time.
> I'm open to ideas, though. If you see anything wrong in my rationale
> or if you have an idea of how to do it differently, then I'd be
> happy to hear. However, if the idea is simply to add a parameter
> that configures the time for leader election to start, then I'm
> currently not in favor.
> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>> What you say is correct, but the original poster does have a point
>> that many
>> of these warnings are to be expected and there is a heuristic that
>> assist in distinguishing some of these cases so that false alarms
>> in the
>> logs could be decreased.
>> That doesn't seem like a big deal to me, but different people have
>> itches. In my experience, restarting a ZK cluster from zero almost
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning
>> <[EMAIL PROTECTED]> wrote:
>>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <[EMAIL PROTECTED]
>>>> Hhmmm, I think this is a bit different isn't it? Here we know
>>>> that the
>>>> server to come will be failing to connect to the other as they
>>>> are not yet
>>>> up. Anyway our real issue is the warning.
>>> We know that.
>>> But how does the server know that it is the first server? That is
>>> whole point of the leader election. You might just have a server
>>> a cluster. Or you might have a cluster that has been turned off.
>>> Or a
>>> cluster with 2 out of 5 machines off and we tried to touch the
>>> other down
>>> machine before the others.
>>>>> Would you like to suggest a patch?
>>>> Of course I do.. will prepare a patch and attach.
> research scientist
> [EMAIL PROTECTED]
> direct +34 93-183-8828
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300 fax (408) 349 3301
direct +34 93-183-8828
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300 fax (408) 349 3301
Vishal Kher 2011-08-19, 20:13
Sampath Perera 2011-08-20, 02:30
Sampath Perera 2011-08-20, 02:23
Flavio Junqueira 2011-08-18, 09:13