>From my understanding, ZooKeeper currently maintains data integrity by
validating all the data before loading it in to memory. Disk-related
errors on one of the machine won't affect the correctness of the ensemble
since we are serving client or peer request from in-memory data only.
However, in ZK-1413. The leader use on-disk txnlog to synchronize with the
learner. It seem like we have to keep checking txnlog integrity every time
we read something from disk. And I don't think integrity check is cheap
too since we have to scan the entire history (starting from a given zxid).
If we cache txnlog in memory, we only need to do integrity check once and
we can also built some indexes on top of it to support more efficient
lookup. However, this is going to consume a lot of memory.
On the other hand, these features (ZK-1413,ZK-22,ZK-1416) don't really
need the entire txnlog to be valid. The server can always say to the
client that the history needed to answer the request is too old and there
is fall back mechanism that allows system to make progress correctly.
>From example, in ZK-1413, the leader can fall back to send a snapshot to
the learner if it cannot use txnlog due to any reason.
On 6/1/13 8:18 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote:
>I think this discussion has been triggered by a discussion we have had
>for ZOOKEEPER-1413. In the patch Thawan proposed there, there was a
>method reads txn logs and it simply logs an error in the case of an
>exception while reading the log. I raised the question of whether we
>should do more than simply logging an error message and the discussion
>about txn log started, but it seems to be a discussion that is out of the
>scope of 1413, so we thought it would be good to have this discussion
>Here are a few thoughts about the issue. We can't really tolerate
>arbitrary corruptions of the txn log because it could imply that we lose
>quorum for a txn that has been processed and a response has been returned
>to the client. In the case that a faulty server only partially writes a
>txn into a txn log because it crashes, the logged txn is corrupt, but we
>don't really have an issue because the server has not acked the txn, so
>if there is a quorum for that txn, the faulty server is not really part
>of it. Cases like this I believe we can do something about, but more
>generally taking care of txn log integrity sounds like a hard problem.
>On Jun 1, 2013, at 4:29 PM, Camille Fournier <[EMAIL PROTECTED]> wrote:
>> I think it's an interesting idea certainly worth discussing. Do you have
>> any proposals for how we might modify? What should we think about wrt
>> migration/backwards compatibility?
>> On Fri, May 31, 2013 at 8:26 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
>>> I just want to start a discussion about the usage of txnlog. Here is
>>> list of features that need to lookup information from txnlog. Theses
>>> feature need to ensure the integrity of txnlog and having an efficient
>>> lookup is good for performance as well.
>>> ZOOKEEPER-1413 - The leader use txnlog to synchronize with the
>>> learners.It need to read txnlog in sequential manner starting from a
>>> ZOOKEEPER-22 The design proposal mentioned that the leader should
>>> txnlog to response to the client if a request is accepted by the
>>> not. The server need to lookup txn by sessionId and cxid
>>> ZOOKEEPER-1416 The server need to be able to tell the list of deleted
>>> nodes starting a given zxid. One possible implementation is to walk
>>> staring from a given zxid and look for delete txn.
>>> Do we need to change the way we store txnlog so that we can ensure
>>> integrity and more efficient lookup?
>>> Thawan Kooburat