Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up


+
Thawan Kooburat 2013-05-20, 05:40
+
fpj@... 2013-05-24, 00:19
+
Thawan Kooburat 2013-05-24, 04:30
+
Thawan Kooburat 2013-05-25, 02:23
+
Raul Gutierrez Segales 2013-05-24, 22:17
Copy link to this message
-
Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java, line 200
> > <https://reviews.apache.org/r/11231/diff/1/?file=293958#file293958line200>
> >
> >     Should we call this method and the one below something like readTxnLog?
>
> Thawan Kooburat wrote:
>     This is an existing method, I think we should only rename a method only if its semantic has changed or the name is really miss leading.
>
> fpj wrote:
>     I think the read method you say exists is in FileTxnLog. My comment is about having a read() method in FileTxnSnapLog because the read could be either for a txn log or a snapshot. I don't think we would be renaming an existing method, but perhaps I'm not getting your comment right.

Yes, you are right. I was confused with FileTxnLog.read(). Sure, I can rename this one.
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 866
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line866>
> >
> >     This seems to be more than a warn, right? It should be either error or fatal.
>
> Thawan Kooburat wrote:
>     Should be WARN or INFO because we just send SNAP whenever we have a case that we cannot handle. It is correct but not optimized.
>
> fpj wrote:
>     You're right, but the logic doesn't seem completely straightforward to me. Is it the case that we end up sending a snapshot because of the special case handled by this if block?
>    
>             if (needOpPacket && !needSnap) {
>                 // This should never happen, but we should fall back to sending
>                 // snapshot just in case.
>    
>                 LOG.error("Unhandled scenario for peer sid: " +  getSid() +
>                          " fall back to use snapshot");
>    
>                 needSnap = true;
>             }
>    
>     If this is right, why do we need special handling instead of just capturing the fact we need a snapshot from the return value of queuedCommittedProposal?
>

This block just act as a safety net to send SNAP if there is case that we forgot to handle (in addition to this one).  When I thought about this carefully, I am not sure if I ever see this log line (Cannot send TRUNC to peer sid...) in our prod or not.
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 861
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line861>
> >
> >     I'm confused about this case. The only way I can think of right now of falling into this case is if the follower has tried to join an epoch that has never become established.
> >    
> >     Why does the learner crash?
>
> Thawan Kooburat wrote:
>     This is also a case when quorum is formed and it has never see any request.  Then, we restart the quorum.
>
> fpj wrote:
>     When you say that a quorum is formed, which point of the protocol you're referring to? When a quorum has acknowledged NEWLEADER or some earlier point?

I think the case that prompt me to add NewEpochZxid-related stuff is below .   Here the leader has seen the follower transaction T(epoch, zxid)  

1. T(1, 4)
2. T(2, 0)  - NEWLEADER
3. T(3, 0)  - NEWLEADER
4. T(3, 1)

If a follower joined a quorum at epoch 2 and die. Its peerLastZxid can be T(2,0) if it get a snapshot from the leader, or it gets NEWLEADER packet via outstanding proposals.

When the same follower try to join the quorum again, the leader will try to ask the follower to TRUNC to T(1,4) which is not possible because there is no txnlog to truncate and the learner will crash.

The fix is to send a DIFF starting at T(1,4) instead.  However, I think I added a block that handle (Cannot send TRUNC to peer ...) just for safety based on the observation that TRUNC can only work if it is within the same epoch.  
- Thawan
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
+
fpj@... 2013-05-28, 21:08
+
Thawan Kooburat 2013-05-29, 19:11
+
fpj@... 2013-05-30, 14:00
+
Thawan Kooburat 2013-05-30, 18:41
+
fpj@... 2013-05-31, 07:06
+
Thawan Kooburat 2013-06-01, 00:08
+
Thawan Kooburat 2013-06-24, 08:40
+
fpj@... 2013-05-28, 21:08
+
Thawan Kooburat 2013-05-24, 22:55
+
Thawan Kooburat 2013-06-24, 08:40