I think this discussion has been triggered by a discussion we have had for ZOOKEEPER-1413. In the patch Thawan proposed there, there was a method reads txn logs and it simply logs an error in the case of an exception while reading the log. I raised the question of whether we should do more than simply logging an error message and the discussion about txn log started, but it seems to be a discussion that is out of the scope of 1413, so we thought it would be good to have this discussion separately,
Here are a few thoughts about the issue. We can't really tolerate arbitrary corruptions of the txn log because it could imply that we lose quorum for a txn that has been processed and a response has been returned to the client. In the case that a faulty server only partially writes a txn into a txn log because it crashes, the logged txn is corrupt, but we don't really have an issue because the server has not acked the txn, so if there is a quorum for that txn, the faulty server is not really part of it. Cases like this I believe we can do something about, but more generally taking care of txn log integrity sounds like a hard problem.
On Jun 1, 2013, at 4:29 PM, Camille Fournier <[EMAIL PROTECTED]> wrote:
> I think it's an interesting idea certainly worth discussing. Do you have
> any proposals for how we might modify? What should we think about wrt
> migration/backwards compatibility?
> On Fri, May 31, 2013 at 8:26 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
>> I just want to start a discussion about the usage of txnlog. Here is the
>> list of features that need to lookup information from txnlog. Theses
>> feature need to ensure the integrity of txnlog and having an efficient
>> lookup is good for performance as well.
>> ZOOKEEPER-1413 - The leader use txnlog to synchronize with the
>> learners.It need to read txnlog in sequential manner starting from a given
>> ZOOKEEPER-22 – The design proposal mentioned that the leader should lookup
>> txnlog to response to the client if a request is accepted by the client or
>> not. The server need to lookup txn by sessionId and cxid
>> ZOOKEEPER-1416 – The server need to be able to tell the list of deleted
>> nodes starting a given zxid. One possible implementation is to walk txnlog
>> staring from a given zxid and look for delete txn.
>> Do we need to change the way we store txnlog so that we can ensure
>> integrity and more efficient lookup?
>> Thawan Kooburat