Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> Two Leaders?!


+
Marshall McMullen 2011-12-20, 17:00
+
Patrick Hunt 2011-12-20, 17:37
+
Benjamin Reed 2011-12-20, 18:13
+
Patrick Hunt 2011-12-20, 18:17
+
Mahadev Konar 2011-12-20, 19:14
Copy link to this message
-
Re: Two Leaders?!
What specific log files should I look for?

I inspected the config files for all 3 nodes and they *are different.
*Specifically,
the servers specified are not consistent:

$ cat /data/zookeeper/10.10.5.56/10.10.5.56_2181.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/10.10.5.56/
maxClientCnxns=1000
clientPortAddress=10.10.5.56
clientPort=2181
server.1=10.10.5.46:2182:2183
server.2=10.10.5.35:2182:2183
server.3=10.10.5.56:2182:2183

$ cat /data/zookeeper/10.10.5.58/10.10.5.58_2181.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/10.10.5.58/
maxClientCnxns=1000
clientPortAddress=10.10.5.58
clientPort=2181
server.1=10.10.5.46:2182:2183
server.2=10.10.5.56:2182:2183
server.3=10.10.5.58:2182:2183

$ cat /data/zookeeper/10.10.5.46/10.10.5.46_2181.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/10.10.5.46/
maxClientCnxns=1000
clientPortAddress=10.10.5.46
clientPort=2181
server.1=10.10.5.46:2182:2183
server.2=10.10.5.35:2182:2183
server.3=10.10.5.56:2182:2183

So this looks like a configuration problem not a zookeeper bug correct?
On Tue, Dec 20, 2011 at 11:17 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:

> Really the logs are critical here. If you can provide them it would shed
> light.
>
> Patrick
>
> On Tue, Dec 20, 2011 at 10:13 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote:
> > i've seen it before when the configuration files haven't been setup
> > properly. i would check the configuration. if the leader is still the
> > leader, it must have active followers connected to it, otherwise it
> > would give up leadership. i would use netstat to find out who they
> > are.
> >
> > ben
> >
> > On Tue, Dec 20, 2011 at 9:00 AM, Marshall McMullen
> > <[EMAIL PROTECTED]> wrote:
> >> Zookeeper devs,
> >>
> >> I've got a cluster with 3 servers in the ensemble all running 3.4.0.
> After
> >> a few days of successful operation, we observed all zookeeper reads and
> >> writes began failing every time. In our log files, the error being
> reported
> >> is INVALID_STATE. I then telnetted to port 2181 on all three servers and
> >> was surprised to see that *two* of these servers both report they are
> the
> >> leader! Two of the nodes are in agreement on the Zxid, and one of the
> nodes
> >> is way out of whack with a much much larger Zxid. The node that all
> writes
> >> are flowing through is the one with the much higher Zxid.
> >>
> >> Has anyone ever seen this before? What can I do to diagnose this problem
> >> and resolve it? I was considering killing zookeeper on the node that
> should
> >> not be the leader (the one with the wrong Zxid) and removing the
> zookeeper
> >> data directory, then restarting zookeeper on that node. Any other ideas?
> >>
> >> I appreciate any help.
>
+
Ted Dunning 2011-12-20, 19:32
+
Marshall McMullen 2011-12-20, 20:24
+
Benjamin Reed 2011-12-20, 21:44
+
Marshall McMullen 2011-12-20, 22:40
+
Benjamin Reed 2011-12-20, 19:35
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB