Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> log file flush?


+
Jason Huang 2013-02-19, 11:28
+
Swapnil Ghike 2013-02-19, 11:51
+
Swapnil Ghike 2013-02-19, 11:54
+
Jason Huang 2013-02-19, 12:06
+
Jun Rao 2013-02-19, 16:26
+
Jay Kreps 2013-02-19, 18:00
+
Jason Huang 2013-02-20, 04:03
+
Jay Kreps 2013-02-20, 04:28
Copy link to this message
-
Re: log file flush?
Very detailed and clear explanation.

Thanks a lot!

Jason

On Tue, Feb 19, 2013 at 11:28 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> Yes, exactly. Here is the full story:
>
> When you restart kafka it checks if a clean shutdown was executed on
> the log (which would have left a marker file), if the shutdown was
> clean it assumes the log was fully flushed and uses it as is. If not
> (as in the case of a hard kill or machine crash) it executes recovery
> on the log. The recovery process validates the CRC of each message in
> the unflushed portion of the log and truncates the log to eliminate
> any partial writes that may have occurred while the server was killed.
> This process guarantees that only valid messages remain. There are
> actually a lot of corner cases in the case of a hard crash, depending
> on the OS/FS, you can also get random corrupt blocks so this process
> handles that case as well. In the case that you outline this would
> mean that the log would contain the 100 messages flushed to disk
> (assuming the last message was fully written) but not (obviously) the
> 50 messages only in RAM.
>
> That all obviously describes the unreplicated case in 0.7.x. In 0.8
> you have the option of having a replication factor with each topic,
> and so you only would lose the 50 messages in pagecache if you lost
> ALL the replicas. If you had another in-sync surviving replica then
> when the server came back up it would resync with the new leader who
> would have the full log and there would be no loss of committed
> messages.
>
> -Jay
>
>
> On Tue, Feb 19, 2013 at 8:03 PM, Jason Huang <[EMAIL PROTECTED]> wrote:
>> This starts to make sense to me.
>>
>> So a log segment file (000000000.log) may have some messages that's in
>> local filesystem hard drive, some messages that's in pagecache? Say if
>> a 0000000.log file has 150 messages and the first 100 has been flushed
>> to local hard drive and the last 50 is still in the pagecache, what
>> would happen if there is machine crash? Then when we restart the
>> server, we will see the 000000.log file with only 100 messages in it?
>>
>> Thanks,
>>
>> Jason
>>
>> On Wed, Feb 20, 2013 at 1:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>> To be clear: to lose data in the filesystem you need to hard kill the
>>> machine. A hard kill of the process will not cause that.
>>>
>>> -Jay
>>>
>>> On Tue, Feb 19, 2013 at 8:25 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>>> Jason,
>>>>
>>>> Although messages are always written to the log segment file, they
>>>> initially are only in the file system's pagecache. As Swapnil mentioned
>>>> earlier, messages are flushed to disk periodically. If you do a clean
>>>> shutdown (kill -15), we close all log file, which should flush all dirty
>>>> data to disk. If you do a hard kill or your machine just crashed, the
>>>> unflushed data may be lost. The data that you saw in the .log file can be
>>>> just in the pagecache.
>>>>
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>> On Tue, Feb 19, 2013 at 4:05 AM, Jason Huang <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Thanks for response.
>>>>>
>>>>> My confusion is that - once I see the message content in the .log
>>>>> file, doesn't that mean the message has already been flushed to the
>>>>> hard drive? Why would those messages still get lost if someone
>>>>> manually kill the process (or if the server crashes unexpectedly)?
>>>>>
>>>>> Jason
>>>>>
>>>>> On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>> > Correction - The flush happens based on *number of messages* and time
>>>>> > limits, whichever is hit first.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On 2/19/13 3:50 AM, "Swapnil Ghike" <[EMAIL PROTECTED]> wrote:
>>>>> >
>>>>> >>The flush happens based on size and time limits,
>>>>> >>whichever is hit first.
>>>>> >
>>>>>