Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Connection closed exceptions with slow fsync and CancelledKeyExceptions


Copy link to this message
-
Re: Connection closed exceptions with slow fsync and CancelledKeyExceptions
What's your filesystem & vm.dirty_ratio? May be your OS flushes a lot
sometimes. See e.g.
http://www.sysxperts.com/home/announce/vmdirtyratioandvmdirtybackgroundratio
2012/11/7 mattgordon <[EMAIL PROTECTED]>

> We have been trying to understand why our ZooKeeper cluster will
> occasionally
> have a wave of connection closed exceptions. We have switched to
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode for garbage collection with
> no noticeable improvements.
>
> The symptoms are:
>
> (1) All nodes show messages like "fsync-ing the write ahead log in
> SyncThread:0 took 6309ms which will adversely effect operation latency. See
> the ZooKeeper troubleshooting guide" with times typically around 5 seconds.
> At least once, this fsync appeared in the leaders log immediately before a
> wave of:
>
> ERROR [CommitProcessor:0:NIOServerCnxn@445] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
>         at
> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>         at
> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
>         at
>
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
>         at
>
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
>         at
>
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:171)
>         at
>
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
>
> Our clients received ZookeeperConnectionClosed exceptions at this time and
> all traffic on the ZooKeeper cluster essentially went to zero for a moment
> before resuming normal operation with new connections.
>
> (2) Probably unrelated since I haven't correlated it temporally with the
> client errors, but running "sudo strace -r -T -f -p 9574 -e
> trace=fsync,fdatasync -o trace.txt" turns up some messages like "10581
> 0.000246 — SIGSEGV (Segmentation fault) @ 0 (0) —"
>
>
> ZK Version: 3.3.4
> Cluster has 5 nodes running in EC2
>
> Here is a screenshot showing ZooKeeper network traffic going to zero at the
> time of the connection closed exceptions: http://i.imgur.com/dfNh0.png
>
>
>
> Anyone have ideas on what the cause of these "waves" of
> CancelledKeyExceptions could be from?
>
>
--
Best regards,
 Vitalii Tymchyshyn
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB