Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> Two Leaders?!


Copy link to this message
-
Re: Two Leaders?!
Agree with Pat. We should dig into this ASAP.

Marshall,
  Mind opening a jira nad posting the logs to it?

thanks
mahadev

On Tue, Dec 20, 2011 at 10:17 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:

> Really the logs are critical here. If you can provide them it would shed
> light.
>
> Patrick
>
> On Tue, Dec 20, 2011 at 10:13 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote:
> > i've seen it before when the configuration files haven't been setup
> > properly. i would check the configuration. if the leader is still the
> > leader, it must have active followers connected to it, otherwise it
> > would give up leadership. i would use netstat to find out who they
> > are.
> >
> > ben
> >
> > On Tue, Dec 20, 2011 at 9:00 AM, Marshall McMullen
> > <[EMAIL PROTECTED]> wrote:
> >> Zookeeper devs,
> >>
> >> I've got a cluster with 3 servers in the ensemble all running 3.4.0.
> After
> >> a few days of successful operation, we observed all zookeeper reads and
> >> writes began failing every time. In our log files, the error being
> reported
> >> is INVALID_STATE. I then telnetted to port 2181 on all three servers and
> >> was surprised to see that *two* of these servers both report they are
> the
> >> leader! Two of the nodes are in agreement on the Zxid, and one of the
> nodes
> >> is way out of whack with a much much larger Zxid. The node that all
> writes
> >> are flowing through is the one with the much higher Zxid.
> >>
> >> Has anyone ever seen this before? What can I do to diagnose this problem
> >> and resolve it? I was considering killing zookeeper on the node that
> should
> >> not be the leader (the one with the wrong Zxid) and removing the
> zookeeper
> >> data directory, then restarting zookeeper on that node. Any other ideas?
> >>
> >> I appreciate any help.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB