|
|
-
Zookeeper service is down when Leader disk is fullLaxman 2011-06-22, 08:23
Hi Everyone,
We have found one issue while testing the disk space full scenario. Request you to validate our observations. Will log an issue if this found to be valid. Problem: Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state. Version: Zookeeper 3.3.3 Scenario If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread. Root Cause: this.join() is invoked in the same thread where System.exit(11) has been triggered. When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked. Thread dumps: The following thread dump shows the QuorumPeerMain thread is infntely waiting inside SyncRequestProcessor. "Thread-2" prio=10 tid=0x0810a400 nid=0x1695 in Object.wait() [0xac783000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb804f5e8> (a org.apache.zookeeper.server.SyncRequestProcessor) at java.lang.Thread.join(Thread.java:1143) - locked <0xb804f5e8> (a org.apache.zookeeper.server.SyncRequestProcessor) at java.lang.Thread.join(Thread.java:1196) at org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcess or.java:171) at org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(Proposa lRequestProcessor.java:79) at org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcess or.java:513) at org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:41 3) at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:411) at org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) at org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java: 126) "SyncThread:2" prio=10 tid=0xad7fd800 nid=0x4acb in Object.wait() [0xac9ba000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb8030d00> (a org.apache.zookeeper.server.quorum.QuorumPeerMain$1) at java.lang.Thread.join(Thread.java:1143) - locked <0xb8030d00> (a org.apache.zookeeper.server.quorum.QuorumPeerMain$1) at java.lang.Thread.join(Thread.java:1196) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79 ) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24) at java.lang.Shutdown.runHooks(Shutdown.java:79) at java.lang.Shutdown.sequence(Shutdown.java:123) at java.lang.Shutdown.exit(Shutdown.java:168) - locked <0xf01ff3b0> (a java.lang.Class for java.lang.Shutdown) at java.lang.Runtime.exit(Runtime.java:90) at java.lang.System.exit(System.java:904) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja va:149) Logs : 2011-06-21 10:09:59,730 - FATAL [SyncThread:2:SyncRequestProcessor@148] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:30 5) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog .java:324) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor. java:158) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja va:98) 2011-06-21 10:09:59,732 - INFO [Thread-2:QuorumPeer@691] - The Quorum server is going for shutdown 2011-06-21 10:09:59,732 - INFO [Thread-2:Leader@393] - Shutdown called java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:393) at org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) at org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java: 126) 2011-06-21 10:09:59,733 - INFO [Thread-6:Leader$LearnerCnxAcceptor@243] - exception while shutting down acceptor: java.net.SocketException: Socket closed 2011-06-21 10:09:59,758 - INFO [ProcessThread:-1:PrepRequestProcessor@120] - PrepRequestProcessor exited loop! 2011-06-21 10:09:59,758 - INFO [CommitProcessor:2:CommitProcessor@150] - CommitProcessor exited loop! 2011-06-21 10:09:59,759 - INFO [Thread-2:FinalRequestProcessor@379] - shutdown of request processor complete 2011-06-21 10:10:00,000 - INFO [SessionTracker:SessionTrackerImpl@165] - SessionTrackerImpl exited loop! Thanks Laxman |