Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - Two Leaders?!


+
Marshall McMullen 2011-12-20, 17:00
+
Patrick Hunt 2011-12-20, 17:37
+
Benjamin Reed 2011-12-20, 18:13
+
Patrick Hunt 2011-12-20, 18:17
+
Mahadev Konar 2011-12-20, 19:14
+
Marshall McMullen 2011-12-20, 19:21
+
Ted Dunning 2011-12-20, 19:32
+
Marshall McMullen 2011-12-20, 20:24
+
Benjamin Reed 2011-12-20, 21:44
+
Marshall McMullen 2011-12-20, 22:40
Copy link to this message
-
Re: Two Leaders?!
Benjamin Reed 2011-12-20, 19:35
yes this is a configuration problem. 10.10.5.35 must be running as well right?

ben

On Tue, Dec 20, 2011 at 11:21 AM, Marshall McMullen
<[EMAIL PROTECTED]> wrote:
> What specific log files should I look for?
>
> I inspected the config files for all 3 nodes and they *are different.
> *Specifically,
> the servers specified are not consistent:
>
> $ cat /data/zookeeper/10.10.5.56/10.10.5.56_2181.cfg
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=/data/zookeeper/10.10.5.56/
> maxClientCnxns=1000
> clientPortAddress=10.10.5.56
> clientPort=2181
> server.1=10.10.5.46:2182:2183
> server.2=10.10.5.35:2182:2183
> server.3=10.10.5.56:2182:2183
>
> $ cat /data/zookeeper/10.10.5.58/10.10.5.58_2181.cfg
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=/data/zookeeper/10.10.5.58/
> maxClientCnxns=1000
> clientPortAddress=10.10.5.58
> clientPort=2181
> server.1=10.10.5.46:2182:2183
> server.2=10.10.5.56:2182:2183
> server.3=10.10.5.58:2182:2183
>
> $ cat /data/zookeeper/10.10.5.46/10.10.5.46_2181.cfg
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=/data/zookeeper/10.10.5.46/
> maxClientCnxns=1000
> clientPortAddress=10.10.5.46
> clientPort=2181
> server.1=10.10.5.46:2182:2183
> server.2=10.10.5.35:2182:2183
> server.3=10.10.5.56:2182:2183
>
> So this looks like a configuration problem not a zookeeper bug correct?
>
>
> On Tue, Dec 20, 2011 at 11:17 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
>
>> Really the logs are critical here. If you can provide them it would shed
>> light.
>>
>> Patrick
>>
>> On Tue, Dec 20, 2011 at 10:13 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote:
>> > i've seen it before when the configuration files haven't been setup
>> > properly. i would check the configuration. if the leader is still the
>> > leader, it must have active followers connected to it, otherwise it
>> > would give up leadership. i would use netstat to find out who they
>> > are.
>> >
>> > ben
>> >
>> > On Tue, Dec 20, 2011 at 9:00 AM, Marshall McMullen
>> > <[EMAIL PROTECTED]> wrote:
>> >> Zookeeper devs,
>> >>
>> >> I've got a cluster with 3 servers in the ensemble all running 3.4.0.
>> After
>> >> a few days of successful operation, we observed all zookeeper reads and
>> >> writes began failing every time. In our log files, the error being
>> reported
>> >> is INVALID_STATE. I then telnetted to port 2181 on all three servers and
>> >> was surprised to see that *two* of these servers both report they are
>> the
>> >> leader! Two of the nodes are in agreement on the Zxid, and one of the
>> nodes
>> >> is way out of whack with a much much larger Zxid. The node that all
>> writes
>> >> are flowing through is the one with the much higher Zxid.
>> >>
>> >> Has anyone ever seen this before? What can I do to diagnose this problem
>> >> and resolve it? I was considering killing zookeeper on the node that
>> should
>> >> not be the leader (the one with the wrong Zxid) and removing the
>> zookeeper
>> >> data directory, then restarting zookeeper on that node. Any other ideas?
>> >>
>> >> I appreciate any help.
>>