[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729700#comment-13729700 ]
Jay Kreps commented on KAFKA-615:
52.2 In most filesystems there is no guarantee that metadata is flushed before/after/atomically with data. Ext3/4 has some guarantee with data=ordered, but this has other issues and we should not rely on it, and we don't want to require users run a particular mount option with their fs. What I am saying is that I think we cover that case. Casewise:
1. If the truncation point is after than the recovery point, then the recovery point remains valid.
2. If the truncation point is before the recovery point then the log is stable up to the truncation point. So in a crash either
a. The metadata flush occurs and the log is correctly truncated, or else
b. The metadata flush doesn't occur and we regain segments of log. This is just the same as if we hadn't truncated the log. The log contents remain stable.
What I think is a problematic case is if
a. We truncate to to offset T where T is less than the recovery point R.
b. We take new writes at offset T, T+1, T+2
c. Then we checkpoint the recovery point at T+2
If this occurred then we have messages T, T+1, etc which have not been flushed but which are below the recovery point. The question I was asking is, can this happen? I don't fully understand how the fetching restarts in the leader change so I am not sure.
> Avoid fsync on log segment roll
> Key: KAFKA-615
> URL: https://issues.apache.org/jira/browse/KAFKA-615
> Project: Kafka
> Issue Type: Bug
> Reporter: Jay Kreps
> Assignee: Neha Narkhede
> Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch, KAFKA-615-v6.patch
> It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging.
> The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll.
> Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira