|
|
-
what would happen with this case ? (ZAB protocol question)
Yang 2011-07-19, 21:44
like the first figure in the ZAB paper described, say we have node A B C, A is leader now
all 3 nodes see proposals P1, P2, an all acked both, A sees acks for P1, and commits it, but right after this A dies.
now B is elected, B does not see any commit, so (according to my possibly wrong understanding from the code) B throws away P1 P2, and starts a new epoch. is this the current behavior of code?
but then the commit of P1 on A is lost?
Thanks Yang
-
Re: what would happen with this case ? (ZAB protocol question)
Yang 2011-07-20, 07:28
I found that my question is basically the same as http://zookeeper-user.578899.n2.nabble.com/Q-about-ZK-internal-how-commit-is-being-remembered-td4464847.htmlbut reading that thread still leaves me unclear as to my original question. the following snippet from LearnerHandler.run() seems to be what the newly-elected leader is doing, basically bringing up every follower to its max committed proposal, and discard the rest. ---- if this is a correct understanding, then the P1 commit in my original question seems to be lost. ?? Thanks Yang final long maxCommittedLog leader.zk.getZKDatabase().getmaxCommittedLog(); final long minCommittedLog leader.zk.getZKDatabase().getminCommittedLog(); LinkedList<Proposal> proposals leader.zk.getZKDatabase().getCommittedLog(); if (proposals.size() != 0) { if ((maxCommittedLog >= peerLastZxid) && (minCommittedLog <= peerLastZxid)) { packetToSend = Leader.DIFF; zxidToSend = maxCommittedLog; for (Proposal propose: proposals) { if (propose.packet.getZxid() > peerLastZxid) { queuePacket(propose.packet); QuorumPacket qcommit = new QuorumPacket(Leader.COMMIT, propose.packet.getZxid(), null, null); queuePacket(qcommit); } } } else if (peerLastZxid > maxCommittedLog) { packetToSend = Leader.TRUNC; zxidToSend = maxCommittedLog; updates = zxidToSend; } } else { // just let the state transfer happen } On Tue, Jul 19, 2011 at 2:44 PM, Yang <[EMAIL PROTECTED]> wrote: > like the first figure in the ZAB paper described, > say we have node A B C, A is leader now > > all 3 nodes see proposals P1, P2, an all acked both, > A sees acks for P1, and commits it, but right after this A dies. > > now B is elected, B does not see any commit, so (according to my > possibly wrong understanding from the code) > B throws away P1 P2, and starts a new epoch. > is this the current behavior of code? > > but then the commit of P1 on A is lost? > > Thanks > Yang >
-
RE: what would happen with this case ? (ZAB protocol question)
Alexander Shraer 2011-07-21, 18:04
Hi, If I understand it correctly, when a server starts-up it locally commits all ops it has ever received (see ZKDataBase.loadDataBase) . Leader election then chooses the node that has the most ops committed to be the leader. It is possible that a minority of servers are down during leader election, but a majority (or quorum) do participate in leader election. Any operation that was truly committed (acked by majority), will be known to one of the servers participating in the leader election, so the elected leader will at least know all truly committed ops. If a server wakes up later and connects to this leader, his log is truncated to match the leader's. But this is safe to do, because as explained above none of the truncated ops could have been previously acked by a quorum. Alex > -----Original Message----- > From: Yang [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, July 20, 2011 12:29 AM > To: [EMAIL PROTECTED] > Subject: Re: what would happen with this case ? (ZAB protocol question) > > I found that my question is basically the same as > > http://zookeeper-user.578899.n2.nabble.com/Q-about-ZK-internal-how-> commit-is-being-remembered-td4464847.html > > but reading that thread still leaves me unclear as to my original > question. > > the following snippet from LearnerHandler.run() seems to be what the > newly-elected leader is doing, basically bringing up every follower to > its max committed proposal, and discard the rest. > ---- if this is a correct understanding, then the P1 commit in my > original question seems to be lost. ?? > > Thanks > Yang > > > > final long maxCommittedLog > leader.zk.getZKDatabase().getmaxCommittedLog(); > final long minCommittedLog > leader.zk.getZKDatabase().getminCommittedLog(); > LinkedList<Proposal> proposals > leader.zk.getZKDatabase().getCommittedLog(); > if (proposals.size() != 0) { > if ((maxCommittedLog >= peerLastZxid) > && (minCommittedLog <= peerLastZxid)) { > packetToSend = Leader.DIFF; > zxidToSend = maxCommittedLog; > for (Proposal propose: proposals) { > if (propose.packet.getZxid() > > peerLastZxid) { > queuePacket(propose.packet); > QuorumPacket qcommit = new > QuorumPacket(Leader.COMMIT, propose.packet.getZxid(), > null, null); > queuePacket(qcommit); > } > } > } else if (peerLastZxid > maxCommittedLog) { > packetToSend = Leader.TRUNC; > zxidToSend = maxCommittedLog; > updates = zxidToSend; > } > } else { > // just let the state transfer happen > } > > On Tue, Jul 19, 2011 at 2:44 PM, Yang <[EMAIL PROTECTED]> wrote: > > like the first figure in the ZAB paper described, > > say we have node A B C, A is leader now > > > > all 3 nodes see proposals P1, P2, an all acked both, > > A sees acks for P1, and commits it, but right after this A dies. > > > > now B is elected, B does not see any commit, so (according to my > > possibly wrong understanding from the code) > > B throws away P1 P2, and starts a new epoch. > > is this the current behavior of code? > > > > but then the commit of P1 on A is lost? > > > > Thanks > > Yang > >
+
Alexander Shraer 2011-07-21, 18:04
-
RE: what would happen with this case ? (ZAB protocol question)
Alexander Shraer 2011-07-21, 20:11
I think you're right - there is a bug here. As I mentioned, when a server starts-up it locally commits all ops it has ever received (see ZKDataBase.loadDataBase). More importantly - the same happens in the Leader.lead() method (zk.loadData()). So when execution reaches the code you quoted maxCommittedLog reflects all transactions this leader has seen before becoming a leader, and everything works. In your scenario everyone see the same set of transactions, so there is no problem. The problem is in leader election - if the server doesn't reboot before running leader election (the usual case) then only the transactions for which it received a commit count and it might not be elected leader, even if it has seen more transactions than the others. This may lead to transactions being dropped. I opened a JIRA for this. Thanks, Alex > -----Original Message----- > From: Yang [mailto:[EMAIL PROTECTED]] > Sent: Thursday, July 21, 2011 11:12 AM > To: Alexander Shraer > Subject: Re: what would happen with this case ? (ZAB protocol question) > > "Any operation that was truly committed (acked by majority), will be > known to one of the servers participating in the leader election" > ------ this is where I'm having difficulty: in the example I gave, the > commit on the dead leader is "Known/seen" by surviving nodes, but the > code snippet I showed seems to suggest that only seen COMMITTED txns > are replayed from new leader, not the seen transactions. > > > thanks > Yang > > > > On Thu, Jul 21, 2011 at 11:04 AM, Alexander Shraer > <[EMAIL PROTECTED]> wrote: > > Hi, > > > > If I understand it correctly, when a server starts-up it locally > commits all ops it has ever received (see ZKDataBase.loadDataBase) . > Leader election then chooses the node that has the most ops committed > to be the leader. It is possible that a minority of servers are down > during leader election, but a majority (or quorum) do participate in > leader election. Any operation that was truly committed (acked by > majority), will be known to one of the servers participating in the > leader election, so the elected leader will at least know all truly > committed ops. If a server wakes up later and connects to this leader, > his log is truncated to match the leader's. But this is safe to do, > because as explained above none of the truncated ops could have been > previously acked by a quorum. > > > > Alex > > > > > > > >> -----Original Message----- > >> From: Yang [mailto:[EMAIL PROTECTED]] > >> Sent: Wednesday, July 20, 2011 12:29 AM > >> To: [EMAIL PROTECTED] > >> Subject: Re: what would happen with this case ? (ZAB protocol > question) > >> > >> I found that my question is basically the same as > >> > >> http://zookeeper-user.578899.n2.nabble.com/Q-about-ZK-internal-how-> >> commit-is-being-remembered-td4464847.html > >> > >> but reading that thread still leaves me unclear as to my original > >> question. > >> > >> the following snippet from LearnerHandler.run() seems to be what the > >> newly-elected leader is doing, basically bringing up every follower > to > >> its max committed proposal, and discard the rest. > >> ---- if this is a correct understanding, then the P1 commit in my > >> original question seems to be lost. ?? > >> > >> Thanks > >> Yang > >> > >> > >> > >> final long maxCommittedLog > >> leader.zk.getZKDatabase().getmaxCommittedLog(); > >> final long minCommittedLog > >> leader.zk.getZKDatabase().getminCommittedLog(); > >> LinkedList<Proposal> proposals > >> leader.zk.getZKDatabase().getCommittedLog(); > >> if (proposals.size() != 0) { > >> if ((maxCommittedLog >= peerLastZxid) > >> && (minCommittedLog <= peerLastZxid)) { > >> packetToSend = Leader.DIFF; > >> zxidToSend = maxCommittedLog; > >> for (Proposal propose: proposals) { > >> if (propose.packet.getZxid() >
+
Alexander Shraer 2011-07-21, 20:11
-
Re: what would happen with this case ? (ZAB protocol question)
Ted Dunning 2011-07-21, 20:24
Alex,
Are you sure that this is a bug.
Take the case of three servers A, B and C with A being leader.
If transactions 1, 2 and 3 are committed, then a majority of the nodes, including at least A, must have seen these transactions. Moreover, transactions cannot be committed on a node unless all previous transactions have been seen on that node as well. Thus, by symmetry, we can consider cases where B alone committed these transactions or where B and C committed them. Only the first case is problematic.
Now, assume further that transaction 4 has arrived at B and been forwarded to A but neither B nor C have committed to it.
The situation now is that in this first epoch, A has seen 1-4, B has seen 1-3 and C has seen nothing. At least two nodes know the current epoch because we obviously have a quorum and we know that B knows the current epoch because it has seen transactions from this epoch. Thus the collection of machines that know the current epoch can be A+B or A+B+C.
IF all three nodes now die simultaneously and B and C come back up, the question is what will happen. We know that the two nodes will agree on the epoch because at least B has the last epoch. Node B will be elected leader because it has seen later transactions than C. C will now get the transactions and we have a quorum in a new epoch.
If A returns at this point, it will know about transactions 1, 2, 3 and 4. Further, it will know that 1, 2, and 3 have been committed in the first epoch and that 4 was proposed, but never committed. As it joins, it will find that a new epoch has started and will recognize B as master. B will tell it to truncate the log by deleting 4, but 4 was never committed anyway.
Where is the problem?
On Thu, Jul 21, 2011 at 1:11 PM, Alexander Shraer <[EMAIL PROTECTED]>wrote:
> The problem is in leader election - if the server doesn't reboot before > running leader election (the usual case) then only the transactions for > which it received a commit count and it might not be elected leader, even if > it has seen more transactions than the others. This may lead to transactions > being dropped. > > I opened a JIRA for this. >
+
Ted Dunning 2011-07-21, 20:24
-
RE: what would happen with this case ? (ZAB protocol question)
Alexander Shraer 2011-07-21, 20:42
Hi Ted,
In your scenario there is no problem I can see. The problem is in another scenario I described in the JIRA - there C has seen more proposals than B but B has seen more commits than C. When leader election happens (and assuming they don't restart beforehand), B will be elected as leader and not C, which is a problem because C's suffix of transactions which were acked by both A and C will be truncated.
Alex
> -----Original Message----- > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Thursday, July 21, 2011 1:25 PM > To: [EMAIL PROTECTED] > Cc: Yang > Subject: Re: what would happen with this case ? (ZAB protocol question) > > Alex, > > Are you sure that this is a bug. > > Take the case of three servers A, B and C with A being leader. > > If transactions 1, 2 and 3 are committed, then a majority of the nodes, > including at least A, must have seen these transactions. Moreover, > transactions cannot be committed on a node unless all previous transactions > have been seen on that node as well. Thus, by symmetry, we can consider > cases where B alone committed these transactions or where B and C committed > them. Only the first case is problematic. > > Now, assume further that transaction 4 has arrived at B and been forwarded > to A but neither B nor C have committed to it. > > The situation now is that in this first epoch, A has seen 1-4, B has seen > 1-3 and C has seen nothing. At least two nodes know the current epoch > because we obviously have a quorum and we know that B knows the current > epoch because it has seen transactions from this epoch. Thus the collection > of machines that know the current epoch can be A+B or A+B+C. > > IF all three nodes now die simultaneously and B and C come back up, the > question is what will happen. We know that the two nodes will agree on the > epoch because at least B has the last epoch. Node B will be elected leader > because it has seen later transactions than C. C will now get the > transactions and we have a quorum in a new epoch. > > If A returns at this point, it will know about transactions 1, 2, 3 and 4. > Further, it will know that 1, 2, and 3 have been committed in the first > epoch and that 4 was proposed, but never committed. As it joins, it will > find that a new epoch has started and will recognize B as master. B will > tell it to truncate the log by deleting 4, but 4 was never committed anyway. > > Where is the problem? > > On Thu, Jul 21, 2011 at 1:11 PM, Alexander Shraer <shralex@yahoo- > inc.com>wrote: > > > The problem is in leader election - if the server doesn't reboot before > > running leader election (the usual case) then only the transactions for > > which it received a commit count and it might not be elected leader, even if > > it has seen more transactions than the others. This may lead to transactions > > being dropped. > > > > I opened a JIRA for this. > >
+
Alexander Shraer 2011-07-21, 20:42
-
Re: what would happen with this case ? (ZAB protocol question)
Ted Dunning 2011-07-21, 22:09
I think the message ordering constraints combined with the quorum deal with this situation.
On Thu, Jul 21, 2011 at 1:42 PM, Alexander Shraer <[EMAIL PROTECTED]>wrote:
> Hi Ted, > > In your scenario there is no problem I can see. The problem is in another > scenario I described in the JIRA - there C has seen more proposals than B > but B has seen more commits than C. When leader election happens (and > assuming they don't restart beforehand), B will be elected as leader and not > C, which is a problem because C's suffix of transactions which were acked by > both A and C will be truncated. > > Alex > > > -----Original Message----- > > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, July 21, 2011 1:25 PM > > To: [EMAIL PROTECTED] > > Cc: Yang > > Subject: Re: what would happen with this case ? (ZAB protocol question) > > > > Alex, > > > > Are you sure that this is a bug. > > > > Take the case of three servers A, B and C with A being leader. > > > > If transactions 1, 2 and 3 are committed, then a majority of the nodes, > > including at least A, must have seen these transactions. Moreover, > > transactions cannot be committed on a node unless all previous > transactions > > have been seen on that node as well. Thus, by symmetry, we can consider > > cases where B alone committed these transactions or where B and C > committed > > them. Only the first case is problematic. > > > > Now, assume further that transaction 4 has arrived at B and been > forwarded > > to A but neither B nor C have committed to it. > > > > The situation now is that in this first epoch, A has seen 1-4, B has seen > > 1-3 and C has seen nothing. At least two nodes know the current epoch > > because we obviously have a quorum and we know that B knows the current > > epoch because it has seen transactions from this epoch. Thus the > collection > > of machines that know the current epoch can be A+B or A+B+C. > > > > IF all three nodes now die simultaneously and B and C come back up, the > > question is what will happen. We know that the two nodes will agree on > the > > epoch because at least B has the last epoch. Node B will be elected > leader > > because it has seen later transactions than C. C will now get the > > transactions and we have a quorum in a new epoch. > > > > If A returns at this point, it will know about transactions 1, 2, 3 and > 4. > > Further, it will know that 1, 2, and 3 have been committed in the first > > epoch and that 4 was proposed, but never committed. As it joins, it will > > find that a new epoch has started and will recognize B as master. B will > > tell it to truncate the log by deleting 4, but 4 was never committed > anyway. > > > > Where is the problem? > > > > On Thu, Jul 21, 2011 at 1:11 PM, Alexander Shraer <shralex@yahoo- > > inc.com>wrote: > > > > > The problem is in leader election - if the server doesn't reboot before > > > running leader election (the usual case) then only the transactions > for > > > which it received a commit count and it might not be elected leader, > even if > > > it has seen more transactions than the others. This may lead to > transactions > > > being dropped. > > > > > > I opened a JIRA for this. > > > >
+
Ted Dunning 2011-07-21, 22:09
|
|