Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up


+
Thawan Kooburat 2013-05-20, 05:40
+
fpj@... 2013-05-24, 00:19
+
Thawan Kooburat 2013-05-24, 04:30
+
Thawan Kooburat 2013-05-25, 02:23
+
Raul Gutierrez Segales 2013-05-24, 22:17
+
Thawan Kooburat 2013-05-29, 06:46
Copy link to this message
-
Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
fpj@... 2013-05-28, 21:08


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, line 545
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line545>
> >
> >     This fast forward parameter is a bit odd. Need to check why that's necessary and if there is an alternative.
>
> Thawan Kooburat wrote:
>     From learner synchronization perspective, the leader what to send only txns which has zxid > peerLastSeenZxid.  I have to do this either in this layer or in the LearnerHandler. Doing in in this layer is easier to write a unit test and reduce code complexity in LearnerHandler.
>    
>

Ok, I think my comment is mostly about style, I rarely see an init method with parameters, but there is nothing really broken about doing it.
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java, line 200
> > <https://reviews.apache.org/r/11231/diff/1/?file=293958#file293958line200>
> >
> >     Should we call this method and the one below something like readTxnLog?
>
> Thawan Kooburat wrote:
>     This is an existing method, I think we should only rename a method only if its semantic has changed or the name is really miss leading.

I think the read method you say exists is in FileTxnLog. My comment is about having a read() method in FileTxnSnapLog because the read could be either for a txn log or a snapshot. I don't think we would be renaming an existing method, but perhaps I'm not getting your comment right.
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 677
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line677>
> >
> >     I think this means to say that there is no txn for this epoch yet because the leader of the epoch is just starting, is it right?
>
> Thawan Kooburat wrote:
>     Any zxid should point to a valid txn.  However, this zxid point to NEWLEADER txn which is never persist to txnlog. If I don't have special handling for this case, the leader will think that this learner has a txn which it has never seen. So it will send TRUNC message to the learner and cause it to crash.
>    
>     Some follower may report that it has this zxid, if it gets a snapshot when the quorum is recently formed or get NEWLEADER package as part of outstanding proposals.  
>    
>     You may also noticed that this patch only changes the leader side. This is intentional because I need to support rolling upgrade.

Ok, sounds good.
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 699
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line699>
> >
> >     Make the log info conditional or use curly braces.
>
> Thawan Kooburat wrote:
>     You mean that I should use this? http://www.slf4j.org/apidocs/org/slf4j/Logger.html#info%28java.lang.String,%20java.lang.Object...%29
>

Yep!
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 726
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line726>
> >
> >     I'm not sure I understand the second part of this case. Is this referring to a follower that is starting with a clean state?
>
> Thawan Kooburat wrote:
>     See my comment about NEWLEADER zxid above

Ok.
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 806
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line806>
> >
> >     This comment implies that if peerLastZxid == maxZxid, we still queue maxZxid, but it is not necessary, right? I have a related comment in one of the test cases.
>
> Thawan Kooburat wrote:
>     If there is a better math notation, let me know. This method won't queue maxZxid in this case.

No, I think this is ok, I just wanted to make sure I had the right understanding.

When you say that a quorum is formed, which point of the protocol you're referring to? When a quorum has acknowledged NEWLEADER or some earlier point?

You're right, but the logic doesn't seem completely straightforward to me. Is it the case that we end up sending a snapshot because of the special case handled by this if block?

        if (needOpPacket && !needSnap) {
            // This should never happen, but we should fall back to sending
            // snapshot just in case.

            LOG.error("Unhandled scenario for peer sid: " +  getSid() +
                     " fall back to use snapshot");

            needSnap = true;
        }

If this is right, why do we need special handling instead of just capturing the fact we need a snapshot from the return value of queuedCommittedProposal?

Ok.

My concern here is that we simply log the error and keep making progress regularly, that's why I suggested we could try to propagate an exception up, but it is not completely straightforward because of the reasons I stated. If a server acks a txn but later cannot read it, then we might end up in an inconsistent state.  
- fpj
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
+
Thawan Kooburat 2013-05-29, 19:11
+
fpj@... 2013-05-30, 14:00
+
Thawan Kooburat 2013-05-30, 18:41
+
fpj@... 2013-05-31, 07:06
+
Thawan Kooburat 2013-06-01, 00:08
+
Thawan Kooburat 2013-06-24, 08:40
+
fpj@... 2013-05-28, 21:08
+
Thawan Kooburat 2013-05-24, 22:55
+
Thawan Kooburat 2013-06-24, 08:40