This starts to make sense to me.
So a log segment file (000000000.log) may have some messages that's in
local filesystem hard drive, some messages that's in pagecache? Say if
a 0000000.log file has 150 messages and the first 100 has been flushed
to local hard drive and the last 50 is still in the pagecache, what
would happen if there is machine crash? Then when we restart the
server, we will see the 000000.log file with only 100 messages in it?
On Wed, Feb 20, 2013 at 1:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> To be clear: to lose data in the filesystem you need to hard kill the
> machine. A hard kill of the process will not cause that.
> On Tue, Feb 19, 2013 at 8:25 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>> Although messages are always written to the log segment file, they
>> initially are only in the file system's pagecache. As Swapnil mentioned
>> earlier, messages are flushed to disk periodically. If you do a clean
>> shutdown (kill -15), we close all log file, which should flush all dirty
>> data to disk. If you do a hard kill or your machine just crashed, the
>> unflushed data may be lost. The data that you saw in the .log file can be
>> just in the pagecache.
>> On Tue, Feb 19, 2013 at 4:05 AM, Jason Huang <[EMAIL PROTECTED]> wrote:
>>> Thanks for response.
>>> My confusion is that - once I see the message content in the .log
>>> file, doesn't that mean the message has already been flushed to the
>>> hard drive? Why would those messages still get lost if someone
>>> manually kill the process (or if the server crashes unexpectedly)?
>>> On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <[EMAIL PROTECTED]>
>>> > Correction - The flush happens based on *number of messages* and time
>>> > limits, whichever is hit first.
>>> > On 2/19/13 3:50 AM, "Swapnil Ghike" <[EMAIL PROTECTED]> wrote:
>>> >>The flush happens based on size and time limits,
>>> >>whichever is hit first.