Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> uncaught exception handler


+
Jeremy Stribling 2012-04-03, 23:51
Copy link to this message
-
RE: uncaught exception handler
I agree we shouldn't swallow java.lang.Error. Please go ahead and open a jira.

Thanks!
--Michi
________________________________________
From: Jeremy Stribling [[EMAIL PROTECTED]]
Sent: Tuesday, April 03, 2012 4:51 PM
To: [EMAIL PROTECTED]
Subject: uncaught exception handler

I'm curious about the origin of the uncaught exception handler that sits
in NIOServerCnxn (looking at ZK 3.3.5).  It just logs the exception to
log.error.  I wonder if it makes sense instead to do a System.exit(1) if
the exception is an OutOfMemoryError (or perhaps a java.lang.Error in
general, since those are not supposed to be caught).

I ask because our use of Zookeeper embeds it in a process where some
other code can cause the JVM to hit its memory limit.  Instead of trying
to soldier on in the face of adversity like this, it seems better for
the whole process to come crashing down, to allow whatever monitor
process is in place to restart the JVM.  When the process just logs and
ignores errors like this, it seems to lead to the ZK servers being
unable to make a quorum, even though they are up and running.

Here's a sample backtrace I've seen:

2012-04-03 19:40:03,643 600695063 [QuorumPeer:/172.29.1.220:2888] ERROR
org.apache.zookeeper.server.NIOServerCnxn  - Thread
Thread[QuorumPeer:/172.29.1.220:2888,5,main] died
java.lang.OutOfMemoryError: GC overhead limit exceeded
         at
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:102)
         at
org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:232)
         at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
         at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
         at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
         at
org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
         at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:131)
         at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
         at
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:242)
         at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:279)
         at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:658)

Any thoughts?  Happy to create a JIRA and possibly a patch if there's
interest.  Thanks,

Jeremy
+
Jeremy Stribling 2012-04-04, 17:10