Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up


+
Thawan Kooburat 2013-05-20, 05:40
+
fpj@... 2013-05-24, 00:19
Copy link to this message
-
Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
Thawan Kooburat 2013-05-24, 04:30


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?

IOException can be thrown from  

hasNext = itr.next();

which for prefetching the next element.

I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/ZKDatabase.java, line 82
> > <https://reviews.apache.org/r/11231/diff/1/?file=293956#file293956line82>
> >
> >     1/3 seems reasonable but arbitrary. I'm curious to know if there is a concrete reason behind the choice or if it just feels right for the default.

This require some tuning. I will comment in the JIRA since this is related to other diff as well
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, line 523
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line523>
> >
> >     What if we do the fast forwarding in this method instead of having a boolean passed to init?

sure, I can refactor it
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, line 545
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line545>
> >
> >     This fast forward parameter is a bit odd. Need to check why that's necessary and if there is an alternative.

>From learner synchronization perspective, the leader what to send only txns which has zxid > peerLastSeenZxid.  I have to do this either in this layer or in the LearnerHandler. Doing in in this layer is easier to write a unit test and reduce code complexity in LearnerHandler.
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java, line 200
> > <https://reviews.apache.org/r/11231/diff/1/?file=293958#file293958line200>
> >
> >     Should we call this method and the one below something like readTxnLog?

This is an existing method, I think we should only rename a method only if its semantic has changed or the name is really miss leading.  
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 677
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line677>
> >
> >     I think this means to say that there is no txn for this epoch yet because the leader of the epoch is just starting, is it right?

Any zxid should point to a valid txn.  However, this zxid point to NEWLEADER txn which is never persist to txnlog. If I don't have special handling for this case, the leader will think that this learner has a txn which it has never seen. So it will send TRUNC message to the learner and cause it to crash.

Some follower may report that it has this zxid, if it gets a snapshot when the quorum is recently formed or get NEWLEADER package as part of outstanding proposals.  

You may also noticed that this patch only changes the leader side. This is intentional because I need to support rolling upgrade.  
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 699
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line699>
> >
> >     Make the log info conditional or use curly braces.

You mean that I should use this? http://www.slf4j.org/apidocs/org/slf4j/Logger.html#info%28java.lang.String,%20java.lang.Object...%29
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 726
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line726>

See my comment about NEWLEADER zxid above

If there is a better math notation, let me know. This method won't queue maxZxid in this case.

no problem

This is also a case when quorum is formed and it has never see any request.  Then, we restart the quorum.

Should be WARN or INFO because we just send SNAP whenever we have a case that we cannot handle. It is correct but not optimized.  

DIFF packet
- Thawan
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
+
Thawan Kooburat 2013-05-25, 02:23
+
Raul Gutierrez Segales 2013-05-24, 22:17
+
Thawan Kooburat 2013-05-29, 06:46
+
fpj@... 2013-05-28, 21:08
+
Thawan Kooburat 2013-05-29, 19:11
+
fpj@... 2013-05-30, 14:00
+
Thawan Kooburat 2013-05-30, 18:41
+
fpj@... 2013-05-31, 07:06
+
Thawan Kooburat 2013-06-01, 00:08
+
Thawan Kooburat 2013-06-24, 08:40
+
fpj@... 2013-05-28, 21:08
+
Thawan Kooburat 2013-05-24, 22:55
+
Thawan Kooburat 2013-06-24, 08:40