Re: Zookeeper reconnect failed due to 'state changed (Expired)'
Are you on Linux? We have seen this pattern (user/sys time low and real
time high in GC time) before. In our case, the problem was due to disk
I/Os. When there are lots of dirty pages (in our case, this is caused by
log4j logging), Linux can draft user threads (in this case GC threads) to
flush the dirty pages. So, all those time in real was spent on disk I/Os,
rather than real GCs. The fix is to tune dirty_expire_centisecs and
to flush dirty pages more frequently to avoid such drafting.
On Wed, Jul 2, 2014 at 1:32 PM, Andrew Otto <[EMAIL PROTECTED]> wrote: