Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> ZooKeeper Cluster Crash resulted in not loadable database


Copy link to this message
-
Re: ZooKeeper Cluster Crash resulted in not loadable database
You can try running them through org.apache.zookeeper.server.LogFormatter
and see what comes out. That's where I would start.

C

On Wed, Sep 5, 2012 at 3:43 AM, Gunnar Wagenknecht
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> I'm investigating a crash of a ZooKeeper 3.3.4 cluster. It seems that
> the cause of the crash was an issue in the networking layer. All the ZK
> server suddenly lost connections to clients as well as all between
> themselves. Only a few seconds later, all ZooKeeper servers had issues
> loading their database because of the following exception.
>
> ERROR [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@224]
> Failed to increment parent cversion for: /a/b/c
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode > NoNode for /a/b/c
> at DataTree.incrementCversion(DataTree.java:1218)
> at FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:222)
> at FileTxnSnapLog.restore(FileTxnSnapLog.java:150)
> at ZKDatabase.loadDataBase(ZKDatabase.java:222)
> at QuorumPeer.getLastLoggedZxid(QuorumPeer.java:493)
> at FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:632)
> at FastLeaderElection.lookForLeader(FastLeaderElection.java:660)
> at QuorumPeer.run(QuorumPeer.java:622)
>
> WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@497]
> Unable to load database
>
> Note that the path "/a/b/c" was different on all servers. Thus, each
> server tried to restore a different transaction.
>
> The only way I was able to bring the cluster back online was to delete
> all the transaction logs on all servers and start with the latest snapshot.
>
> I have all the logs and snapshots available for investigation. Are there
> any tools to help an investigation? I'd like to find out how such a
> network outage could possibly cause such an inconsistent/instable state
> in the system. I noticed a few stability fixes in 3.3.5/3.3.6. Thus, an
> upgrade is already scheduled.
>
> Any help is appreciated.
>
> -Gunnar
>
>
>
> --
> Gunnar Wagenknecht
> [EMAIL PROTECTED]
> http://wagenknecht.org/
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB