Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> Features that depends on txnlog lookup and its integrity


Copy link to this message
-
Re: Features that depends on txnlog lookup and its integrity
The two key points I can extract from this discussion and please feel free to add to my list are:

- We can't tolerate arbitrary corruption of log entries. We can tolerate corruption of a log suffix due to a crash, in which case the txns in the log have not been acknowledged.
- Verifying digests is possibly expensive, so we might need to look at ways to avoid the performance penalty, like caching txns in memory.

One more comment below:
 
On Jun 4, 2013, at 9:22 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:

>
> On 6/3/13 9:54 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote:
>
>> On Jun 3, 2013, at 12:41 AM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
>>
>>> From my understanding, ZooKeeper currently maintains data integrity by
>>> validating all the data before loading it in to memory. Disk-related
>>> errors on one of the machine won't affect the correctness of the
>>> ensemble
>>> since we are serving client or peer request from in-memory data only.
>>
>> Let me try to be a bit more concrete. Say that we corrupt arbitrarily a
>> txn T in a log file, and that T has been acknowledged by 3 servers (S1,
>> S2, S3) in an ensemble of 5 servers (S1, S2, S3, S4, S5). Let's assume
>> that S3 has corrupted T in its log. Next say that S5 becomes the leader
>> supported by S3 and S4 (S3 has restarted). We can elect S5 because it has
>> the same history as S3 and S3 has corrupted T (we ignore any transaction
>> it may have after T), which S5 doesn't have. If this can happen, then we
>> lost T even though T has been acknowledged by a quorum.
>>
>> In any case, I'm interested in defining precisely what integrity
>> guarantees we provide for txn logs/snapshots. The point I was trying to
>> convey is that we can't tolerate arbitrary corruptions of the txn log. We
>> can only tolerate (and I'm not convinced there is a reason to push it
>> further) corruption of a suffix of the txn log that has not been
>> acknowledged and the txns in this suffix haven't been acknowledged
>> because the server crashed before they have been completely flushed to
>> disk.
>
> I believe the problem you are describing here is essentially the fact that
> we have more failure than we can tolerate. Ideally, if S1 or S2
> participated in the next round of leader election, S1 or S2 should be
> elected as a leader because they have the highest zxid. S3 has txnlog
> corruption at T so it should reports its zxid as T-1 during leader
> election.
>
>
> Because of how leader election works, a corruption in less than a majority
> should not affect the correctness. However, in ZK-1413,ZK-22,ZK-1416, a
> server use its local txnlog to response to a request. So they are
> vulnerable to a single machine disk corruption or operator error. However,
> it won't affect correctness if we can detect the corruption correctly.

In the case I described above, there was a corruption only in one server and yet it caused a problematic scenario. I don't think we can claim that we can tolerate corruption of a minority. One single corruption might be problematic already.

>
>>
>>>
>>> However, in ZK-1413. The leader use on-disk txnlog to synchronize with
>>> the
>>> learner. It seem like we have to keep checking txnlog integrity every
>>> time
>>> we read something from disk. And I don't think integrity check is cheap
>>> too since we have to scan the entire history (starting from a given
>>> zxid).
>>
>> For the average case, this might not be too bad. If I remember correctly,
>> it is possible to calibrate the amount of transactions a server is
>> willing to read from disk when deciding whether to send a snapshot.
>>
>>>
>>> If we cache txnlog in memory, we only need to do integrity check once
>>> and
>>> we can also built some indexes on top of it to support more efficient
>>> lookup. However, this is going to consume a lot of memory.
>>>
>>
>> Agreed, although I'd rather generate a few numbers before we claim it is
>> bad and that we need a cache
>
> For 1413, the current implementation works fine if the parameters are
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB