Yes, exactly. Here is the full story:

When you restart kafka it checks if a clean shutdown was executed on
the log (which would have left a marker file), if the shutdown was
clean it assumes the log was fully flushed and uses it as is. If not
(as in the case of a hard kill or machine crash) it executes recovery
on the log. The recovery process validates the CRC of each message in
the unflushed portion of the log and truncates the log to eliminate
any partial writes that may have occurred while the server was killed.
This process guarantees that only valid messages remain. There are
actually a lot of corner cases in the case of a hard crash, depending
on the OS/FS, you can also get random corrupt blocks so this process
handles that case as well. In the case that you outline this would
mean that the log would contain the 100 messages flushed to disk
(assuming the last message was fully written) but not (obviously) the
50 messages only in RAM.

That all obviously describes the unreplicated case in 0.7.x. In 0.8
you have the option of having a replication factor with each topic,
and so you only would lose the 50 messages in pagecache if you lost
ALL the replicas. If you had another in-sync surviving replica then
when the server came back up it would resync with the new leader who
would have the full log and there would be no loss of committed

On Tue, Feb 19, 2013 at 8:03 PM, Jason Huang <[EMAIL PROTECTED]> wrote:

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB