Scott Fines 2012-07-01, 21:58
Patrick Hunt 2012-07-02, 16:36
Patrick, agreed. I've seen additional threads referencing this thread and
thought I would follow-up with what I've learned since.
Due to a missed function call in the Linux timekeeping code, the leap
second was not accounted for properly. As a result, after the leap second,
timers expired one second earlier than requested. Many applications use a
recurring timer of 1 second or less; such timers expired immediately,
causing the application to immediately try to set another timer, ad
infinitum. This infinite loop led to CPU load spikes.
In case of interest, we wrote a blog post detailing it:
On Mon, Jul 2, 2012 at 9:36 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> Thanks for the report Scott, from what I've seen so far this seems to
> be a Linux bug and not specific to java/ZK, here are a couple of the
> more informative link's I've seen:
> Anyone have specific insight into how this expressed itself in java?
> I've seen some references to futex being the root (from java
> perspective) "It's a critical Linux bug that causes futex to timeout,
> and anything that uses it to behave incorrectly."
> On Sun, Jul 1, 2012 at 2:58 PM, Scott Fines <[EMAIL PROTECTED]> wrote:
> > Hello all,
> > It appears that ZooKeeper is subject to the linux leap seconds bug that
> has caused problems with Cassandra and other services. At least, I
> discovered that after 6 hours of trying to figure out why my cluster wasn't
> giving me a quorum.
> > A link to the kernel bug report is at
> > As far as what you might see in your logs, I saw a lost quorum, insanely
> high load on my servers, and when I shut down zookeeper to bring it back
> up, one machine would report a read timeout during leader election, then
> report that the server told it to shut down. After that, it would forever
> be stuck in the LOOKING phase, while another machine might be stuck in any
> other phase of the election.
> > The fix is simple, though. Just stop ZooKeeper, execute
> > date -s "`date`"
> > or restart your ntp daemon, then start zookeeper back up.
> > you MUST restart zookeeper, otherwise, the election state doesn't
> recover (or, at least, it didn't recover for me)
> > Hope this helps save someone else the 7 hours of agony I just went
> > Scott Fines