Kafka, mail # user - Re: Messages from producer are immediately going to /tmp/logs in kafka - 2013-10-14, 17:29
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: Messages from producer are immediately going to /tmp/logs in kafka
I believe this is the first complaint we have got on a lack of data loss.

The behavior of kafka is to immediately write all messages to the
filesystem. The operating system will sync the file to disk at its own pace
(we give some docs on how linux does it in our operations section in the
kafka docs and this is pretty well documented on the internet). As the docs
say, the configuration you are describing just controls the frequency with
which kafka forces an fsync and has nothing to do with writing to the fs
(which is always immediate). Fysnc makes the os write the data in its cache
to physical disk.

This makes forcing message loss a little hard.  Killing the process won't
work because the data is not stored in the application memory it is in the
filesystem cache. Shutting down the machine will not cause this as the OS
flushes the data to disk before shutting down. If you want to force data
loss I think you need to yank the plug on the machine immediately after a
write but prior to both an application level fsync and the OS's own flush
policy.

-Jay

-Jay
On Mon, Oct 14, 2013 at 10:00 AM, Monika Garg <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB