I was reviewing the ZK code and there is one case that doesn't seem to be handled correctly. I might be reading the code wrong or it may be a bug, so I wanted to run it by you folks.
1. Lets say there are three nodes in the ensemble A,B,C with A being the leader
2. The current epoch is 7.
3. For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit.
4. The zxid is 73
5. All the nodes have seen the change 73 and have persistently logged it.
Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log.
B,C restart, A is still down
B,C form the quorum
B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
epoch is now 8, zxid is 80
Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81
A starts up. It applies the change in request with zxid 74 to its in-memory data tree
A contacts B to registerAsFollower and provides 74 as its ZxId
Since 71<=74<=81, B decides to send A the diff. B will send to A the proposal 81.
The problem with the above sequence is that A's data tree has the update from request 74, which is not correct. Before getting the proposals 81, A should have received a trunc to 73. I don't see that in the code. If the maxCommitLog on B hadn't bumped to 81 but had stayed at 73, that case seems to be fine.
Looking forward to hearing from you guys regarding whether I am missing something in the code or if it is a bug that we need to fix.