Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Failure to rejoin ensemble after reboot


+
Marshall McMullen 2012-07-09, 04:44
Copy link to this message
-
Re: Failure to rejoin ensemble after reboot
That is very strange. What do the logs of the misbehaving server say? What
do the logs of the other servers say? What does a stack dump of the
misbehaving server look like?
Also, just to clarify, if you don't do anything but fully stop and restart
the cluster (no deleting version-2 files etc) the whole ensemble will
reform successfully?

C

On Mon, Jul 9, 2012 at 12:44 AM, Marshall McMullen <
[EMAIL PROTECTED]> wrote:

> I'm trying to get to the bottom of a problem we're seeing where after I
> forcibly reboot an ensemble node (running on Linux) via "reboot -f" it is
> unable to rejoin the ensemble and no clients can connect to it. Has anyone
> ever seen a problem like this before?
>
> I have been investigating this under
> https://issues.apache.org/jira/browse/ZOOKEEPER-1453 as on the surface it
> looked like there was some sort of transaction/log corruption going on. But
> now I'm not so sure of that.
>
> What bothers me the most right now is that I am unable to reliably get the
> node in question to rejoin the ensemble. I've removed the contents of the
> "version-2" directory and restarted zookeeper to no avail. It regenerates
> an epoch file but never obtains the new database from a peer. I event went
> so far as to copy the on-disk database from another node and restart
> zookeeper and I still can't get it to rejoin the ensemble. I've also
> seen anomalous behavior where once I get it into this failed state, I just
> stopped all three zookeeper server processes entirely then start them all
> back up... then everything connects and all three nodes are in the
> ensemble. But this really shouldn't be necessary.
>
> None of this matches the behavior I expected. Anyone have any insight it
> would be greatly appreciated.
>
+
Marshall McMullen 2012-07-09, 14:14
+
Camille Fournier 2012-07-09, 14:16
+
Patrick Hunt 2012-07-09, 16:48
+
Marshall McMullen 2012-07-09, 14:19
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB