Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> robustness in the face of clock changes

Copy link to this message
robustness in the face of clock changes
I have seen a number of issues at client sites related to cavalier
adjustments of clocks.  Up to now, my response has been to simply say
"don't do that", but lately it has been bugging me and it seems like there
should be a better solution.

The problem scenario involves a step-wise time change on a ZK server node
either forward or backwards.  The issues are:

- a step backwards causes all of the timeouts to be extended by the amount
of the step.  Thus, if you set all clocks back by an hour, no session will
time out for the next hour of real-time.  This is bad.

- a step forward of sufficient size will cause all live session to
immediately time out.

To investigate solutions, I played around a bit with nanoTime and
currentTimeMillis.  My experiments verified that on Linux, nanoTime is,
indeed, a timer and currentTimeMillis is a reference to the absolute system
clock.  In my test program, I use both as the system time is modified and I
see stable behavior from nanoTime and the predictably goofy behavior from
currentTimeMillis.  My test code is at https://github.com/tdunning/timeSkew

>From these tests, it seems that using nanoTime would be substantially
better than using currentTimeMillis in ZK.  I think that Camille brought
this up a while ago, but I don't remember this going forward.  Right now,
ZK is very delicate in the face of clock changes and it seems that it could
be very robust.  Moreover, many naive admins and some experienced admins
seem to have no clue about how to keep their clocks well behaved so this
delicate nature causes lots of problems.

Should I try to prepare a patch?

One other thing that I see is that I can't find any way to cause a java
process to sleep for an elapsed time.  All timer related sleeps that I can
find work relative to absolute time rather than intervals.  The only
work-around I have found is to use Thread.yield() in a polling loop which
is clearly only one half step above hideous.

Relative to ZK, my question is whether there any critical need anywhere in
ZK for a timed sleep.