Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> what would happen with this case ? (ZAB protocol question)


+
Yang 2011-07-19, 21:44
+
Yang 2011-07-20, 07:28
+
Alexander Shraer 2011-07-21, 18:04
+
Alexander Shraer 2011-07-21, 20:11
+
Ted Dunning 2011-07-21, 20:24
Copy link to this message
-
RE: what would happen with this case ? (ZAB protocol question)
Hi Ted,

In your scenario there is no problem I can see. The problem is in another scenario I described in the JIRA - there C has seen more proposals than B but B has seen more commits than C. When leader election happens (and assuming they don't restart beforehand), B will be elected as leader and not C, which is a problem because C's suffix of transactions which were acked by both A and C will be truncated.

Alex

> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, July 21, 2011 1:25 PM
> To: [EMAIL PROTECTED]
> Cc: Yang
> Subject: Re: what would happen with this case ? (ZAB protocol question)
>
> Alex,
>
> Are you sure that this is a bug.
>
> Take the case of three servers A, B and C with A being leader.
>
> If transactions 1, 2 and 3 are committed, then a majority of the nodes,
> including at least A, must have seen these transactions.  Moreover,
> transactions cannot be committed on a node unless all previous transactions
> have been seen on that node as well.  Thus, by symmetry, we can consider
> cases where B alone committed these transactions or where B and C committed
> them.  Only the first case is problematic.
>
> Now, assume further that transaction 4 has arrived at B and been forwarded
> to A but neither B nor C have committed to it.
>
> The situation now is that in this first epoch, A has seen 1-4, B has seen
> 1-3 and C has seen nothing.  At least two nodes know the current epoch
> because we obviously have a quorum and we know that B knows the current
> epoch because it has seen transactions from this epoch.  Thus the collection
> of machines that know the current epoch can be A+B or A+B+C.
>
> IF all three nodes now die simultaneously and B and C come back up, the
> question is what will happen.  We know that the two nodes will agree on the
> epoch because at least B has the last epoch.  Node B will be elected leader
> because it has seen later transactions than C.  C will now get the
> transactions and we have a quorum in a new epoch.
>
> If A returns at this point, it will know about transactions 1, 2, 3 and 4.
>  Further, it will know that 1, 2, and 3 have been committed in the first
> epoch and that 4 was proposed, but never committed.  As it joins, it will
> find that a new epoch has started and will recognize B as master.  B will
> tell it to truncate the log by deleting 4, but 4 was never committed anyway.
>
> Where is the problem?
>
> On Thu, Jul 21, 2011 at 1:11 PM, Alexander Shraer <shralex@yahoo-
> inc.com>wrote:
>
> > The problem is in leader election - if the server doesn't reboot before
> > running leader election (the usual case)  then only the transactions for
> > which it received a commit count and it might not be elected leader, even if
> > it has seen more transactions than the others. This may lead to transactions
> > being dropped.
> >
> > I opened a JIRA for this.
> >
+
Ted Dunning 2011-07-21, 22:09