Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> Zookeeper service is down when Leader disk is full


Copy link to this message
-
Zookeeper service is down when Leader disk is full
Hi Everyone,

  

We have found one issue while testing the disk space full scenario. Request
you to validate our observations. Will log an issue if this found to be
valid.

 

Problem: Zookeeper is not shut down completely when dataDir disk space is
full and ZK Cluster went into unserviceable state.
Version: Zookeeper 3.3.3

 

Scenario
If the leader zookeeper disk is made full, the zookeeper is trying to
shutdown. But it is waiting indefinitely while shutting down the
SyncRequestProcessor thread.

Root Cause: this.join() is invoked in the same thread where System.exit(11)
has been triggered.
When disk space full happens, It got the exception as follows 'No space left
on device' and invoked System.exit(11) from the SyncRequestProcessor
thread(The following logs shows the same). Before exiting JVM, ZK will
execute the ShutdownHook of QuorumPeerMain and the flow comes to
SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same
thread where System.exit(11) has been invoked.

Thread dumps:

The following thread dump shows the QuorumPeerMain thread is infntely
waiting inside SyncRequestProcessor.

"Thread-2" prio=10 tid=0x0810a400 nid=0x1695 in Object.wait() [0xac783000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor)
        at java.lang.Thread.join(Thread.java:1143)
        - locked <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor)
        at java.lang.Thread.join(Thread.java:1196)
        at
org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcess
or.java:171)
        at
org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(Proposa
lRequestProcessor.java:79)
        at
org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcess
or.java:513)
        at
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:41
3)
        at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:411)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:
126)

"SyncThread:2" prio=10 tid=0xad7fd800 nid=0x4acb in Object.wait()
[0xac9ba000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xb8030d00> (a
org.apache.zookeeper.server.quorum.QuorumPeerMain$1)
        at java.lang.Thread.join(Thread.java:1143)
        - locked <0xb8030d00> (a
org.apache.zookeeper.server.quorum.QuorumPeerMain$1)
        at java.lang.Thread.join(Thread.java:1196)
        at
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79
)
        at
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24)
        at java.lang.Shutdown.runHooks(Shutdown.java:79)
        at java.lang.Shutdown.sequence(Shutdown.java:123)
        at java.lang.Shutdown.exit(Shutdown.java:168)
        - locked <0xf01ff3b0> (a java.lang.Class for java.lang.Shutdown)
        at java.lang.Runtime.exit(Runtime.java:90)
        at java.lang.System.exit(System.java:904)
        at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja
va:149)

Logs :
2011-06-21 10:09:59,730 - FATAL [SyncThread:2:SyncRequestProcessor@148] -
Severe unrecoverable error, exiting
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

        at
org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:30
5)
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog
.java:324)
        at
org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
        at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.
java:158)
        at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja
va:98)
2011-06-21 10:09:59,732 - INFO  [Thread-2:QuorumPeer@691] - The Quorum
server is going for shutdown
2011-06-21 10:09:59,732 - INFO  [Thread-2:Leader@393] - Shutdown called
java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown
        at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:393)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:
126)
2011-06-21 10:09:59,733 - INFO  [Thread-6:Leader$LearnerCnxAcceptor@243] -
exception while shutting down acceptor: java.net.SocketException: Socket
closed
2011-06-21 10:09:59,758 - INFO  [ProcessThread:-1:PrepRequestProcessor@120]
- PrepRequestProcessor exited loop!
2011-06-21 10:09:59,758 - INFO  [CommitProcessor:2:CommitProcessor@150] -
CommitProcessor exited loop!
2011-06-21 10:09:59,759 - INFO  [Thread-2:FinalRequestProcessor@379] -
shutdown of request processor complete
2011-06-21 10:10:00,000 - INFO  [SessionTracker:SessionTrackerImpl@165] -
SessionTrackerImpl exited loop!

 
Thanks
Laxman
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB