On Fri, Aug 9, 2013, at 18:13, Deepinder Singh Setia wrote:
> Aug 9 07:07:20 a2s1 python: OperationTimeoutException: operation
That's one of the "retryable exceptions" in Kazoo. So if you'd use
client.retry, you could tolerate one or more instances of this error.
> zookeeper logs around the error time:
> 2013-08-09 07:07:06,580 [myid:] - WARN [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 2291ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting
More than 2 seconds of fsync stall is quite long. And with that or GC
pauses, it's more than likely that you exceed the session timeout
Did you follow the recommendations in
around using dedicated disks for the transaction log and using a
dedicated machine for Zookeeper to avoid other processes stalling it?
> Could the client (Kazoo) be timing out because of fsync delay? What
> parameter would control duration for OperationTimeoutException that I can
> perhaps increase to verify? There is only ZooKeeper client and the load
> isn't much - 1 read/sec and 2 writes/sec roughly. Zookeeper configuration
> is default. Kazoo client params are also default.
In the admin guide, look at tickTime and syncLimit. In a default config
the session timeout is ~4 seconds. While you can increase this value,
you thereby also increase the minimum time it takes Zookeeper to
consider an actual client to be dead. Depending on what you use ZK for,
you might prefer failing fast and thus low session timeout values.