What I have seen so far is mostly related to init/sync limit together with
snapshot size. (ZOOKEEPER-1697, ZOOKEEPER-1521)
It might be possible that a client trying to reconnect cause a load spike
on the server and push the server over the limit, but you will have to
have lots of clients in this case.
I think it will be easier to narrow down the problem by checking which
phase (e.g. Leader election or synchronization) the quorum fails
On 5/13/13 10:48 AM, "Marshall McMullen" <[EMAIL PROTECTED]>
>I'm debugging a problem we're seeing where after quorum loss quorum does
>not recover as I expect it should. It seems that I've isolated the problem
>to quorum not be re-established if there are clients trying to connect to
>the ensemble at the same time that the nodes are coming up and trying to
>form quorum. Is there any known issue with this? I've searched for open
>Jiras without any luck.