Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up


Copy link to this message
-
Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 866
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line866>
> >
> >     This seems to be more than a warn, right? It should be either error or fatal.
>
> Thawan Kooburat wrote:
>     Should be WARN or INFO because we just send SNAP whenever we have a case that we cannot handle. It is correct but not optimized.
>
> fpj wrote:
>     You're right, but the logic doesn't seem completely straightforward to me. Is it the case that we end up sending a snapshot because of the special case handled by this if block?
>    
>             if (needOpPacket && !needSnap) {
>                 // This should never happen, but we should fall back to sending
>                 // snapshot just in case.
>    
>                 LOG.error("Unhandled scenario for peer sid: " +  getSid() +
>                          " fall back to use snapshot");
>    
>                 needSnap = true;
>             }
>    
>     If this is right, why do we need special handling instead of just capturing the fact we need a snapshot from the return value of queuedCommittedProposal?
>
>
> Thawan Kooburat wrote:
>     This block just act as a safety net to send SNAP if there is case that we forgot to handle (in addition to this one).  When I thought about this carefully, I am not sure if I ever see this log line (Cannot send TRUNC to peer sid...) in our prod or not.
>
> fpj wrote:
>     I was actually asking how we end up sending a snapshot. If I get it right, we need to set needSnap and the only place I could spot that sets needSnap after hitting the case I originally commented on (the comment about being error or fatal). Is the block I copied above the one that causes us to send a snapshot in the case we hit the scenario of LearnerHandler:750?

Yes, your understanding is correct
> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
>
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>    
>     hasNext = itr.next();
>    
>     which for prefetching the next element.
>    
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>
>
> Raul Gutierrez Segales wrote:
>     Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?
>
> Thawan Kooburat wrote:
>     itr.next() is the only place that actually perform IO-related operation.
>    
>     Record#serialize() can thrown IO exception if the output stream is writing to network or disk. However, in this case, we are writing it to memory because we are using BinaryOutputArchive/ByteArrayOutputStream.  I looked at the code, these implementation cannot throw IOException but they have to inherit the method signature of output stream.
>
> fpj wrote:
>     My concern here is that we simply log the error and keep making progress regularly, that's why I suggested we could try to propagate an exception up, but it is not completely straightforward because of the reasons I stated. If a server acks a txn but later cannot read it, then we might end up in an inconsistent state.
>
> Thawan Kooburat wrote:
>     From synchronization perspective, if the leader encounter a possible hole or corrupted txnlog, it will have to abort by clear the txn queue which is going to be sent to the learner. Then, it can fall back to sending a snapshot.
>    
>     The txnlog facility need to be able to guarantee there properties but i don't think we have that yet.  I think we only do per-entry checksum and throws IOexception because of disk read, checksum or deserialization failure.

This depends on how we decide to handle the error.  With the current code, it will be easier but slower to verify the integrity of the entire txnlog file during db.getProposalsFromTxnLog() call,  so the leader won't enter try to use txnlog at all and fallback to use snapshot. If we decide to load txnlog on-demand, we need a way to propagate the exception and abort the DIFF attempt.

Right now, I am not sure if we have a way to verify the integrity of txnlog file.  For example, we should have a notion that a file is already closed and it should contains txn [1..10].  If we get IOException before reading txn 10, its means that txnlog is corrupted. If a file is not close, it means that this is the latest file so failing to reading txn 10 may a result of a partial write, so it is ok to ignore.

I think there are 2 main problems that can cause txnlog to fail integrity check needed by this feature: the file is partially corrupt or the operator didn't copied all the txnlog file. We may have to have a separate JIRA to deal with these problem.    
- Thawan
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote: