Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - Two Leaders?!


+
Marshall McMullen 2011-12-20, 17:00
+
Patrick Hunt 2011-12-20, 17:37
+
Benjamin Reed 2011-12-20, 18:13
+
Patrick Hunt 2011-12-20, 18:17
Copy link to this message
-
Re: Two Leaders?!
Mahadev Konar 2011-12-20, 19:14
Agree with Pat. We should dig into this ASAP.

Marshall,
  Mind opening a jira nad posting the logs to it?

thanks
mahadev

On Tue, Dec 20, 2011 at 10:17 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:

> Really the logs are critical here. If you can provide them it would shed
> light.
>
> Patrick
>
> On Tue, Dec 20, 2011 at 10:13 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote:
> > i've seen it before when the configuration files haven't been setup
> > properly. i would check the configuration. if the leader is still the
> > leader, it must have active followers connected to it, otherwise it
> > would give up leadership. i would use netstat to find out who they
> > are.
> >
> > ben
> >
> > On Tue, Dec 20, 2011 at 9:00 AM, Marshall McMullen
> > <[EMAIL PROTECTED]> wrote:
> >> Zookeeper devs,
> >>
> >> I've got a cluster with 3 servers in the ensemble all running 3.4.0.
> After
> >> a few days of successful operation, we observed all zookeeper reads and
> >> writes began failing every time. In our log files, the error being
> reported
> >> is INVALID_STATE. I then telnetted to port 2181 on all three servers and
> >> was surprised to see that *two* of these servers both report they are
> the
> >> leader! Two of the nodes are in agreement on the Zxid, and one of the
> nodes
> >> is way out of whack with a much much larger Zxid. The node that all
> writes
> >> are flowing through is the one with the much higher Zxid.
> >>
> >> Has anyone ever seen this before? What can I do to diagnose this problem
> >> and resolve it? I was considering killing zookeeper on the node that
> should
> >> not be the leader (the one with the wrong Zxid) and removing the
> zookeeper
> >> data directory, then restarting zookeeper on that node. Any other ideas?
> >>
> >> I appreciate any help.
>
+
Marshall McMullen 2011-12-20, 19:21
+
Ted Dunning 2011-12-20, 19:32
+
Marshall McMullen 2011-12-20, 20:24
+
Benjamin Reed 2011-12-20, 21:44
+
Marshall McMullen 2011-12-20, 22:40
+
Benjamin Reed 2011-12-20, 19:35