I have noticed the following pattern in our cluster today:
The leader reports:
> Unexpected exception causing shutdown while sock still open
> ******* GOODBYE ... ********
All the other server report:
> Exception when following the leader
> java.net.SocketTimeoutException: Read timed out
This is a five node cluster running ZK 3.3.3 (yes, it's very old, sorry).
It all happened within the a second across the whole cluster. Does that
sound like a network issue?
As a side effect, the database got corrupted as well. Anybody knows if
this is a known issue in 3.3.3? I checked the release notes and JIRA
tickets but didn't found anything that looks like the pattern we saw.