Kafka, mail # user - Re: Keeping logs forever - 2013-02-22, 21:39
 Search Hadoop and all its subprojects:

Switch to Plain View
Anthony Grimes 2013-02-22, 00:07
Eric Tschetter 2013-02-22, 00:30
Milind Parikh 2013-02-22, 00:44
Jay Kreps 2013-02-22, 01:26
graham sanderson 2013-02-22, 02:47
Anthony Grimes 2013-02-22, 05:33
Copy link to this message
Re: Keeping logs forever
Apologies for asking another question as a newbie without having really
tried stuff out, but actually one of our main reasons for wanting to use
kafka (not the linkedin use case) is exactly the fact that the "buffer" is
not just for buffering. We want to keep data for days to weeks, and be able
to add ad-hoc consumers after the fact (obviously we could do that based on
downstream systems in HDFS), however lets say we have N machines gathering
approximate runtime statistics to use real time in live web applications;
it is easy for them to listen to the stream destined for HDFS and keep such
stats. If we have to add a new machine, or one dies etc. it totally makes
sense to use the same code and just have it replay the last H hours of
events to get back up to speed.

Sorry if my comments caused this type of concern.  Keeping days to weeks of
data around is normal in Kafka (it defaults to keeping 7 days worth of data
around, but that's configurable) and replaying from that is definitely
within the realm of what it does well.  My comments were more around the
"forever" comments, and as Jay says, it should be possible, you just have
to keep adding more disks and machines to store all the data.

I believe the replication in 0.8 will allow for migration of data if you
lose nodes and stuff too, so maybe my concerns were poorly founded.

Jay Kreps 2013-02-23, 04:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB