Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Keeping logs forever

Copy link to this message
Re: Keeping logs forever

Is there a reason you wouldn't want to just push the data into something
built for cheap, long-term storage (like glacier, S3, or HDFS) and perhaps
"replay" from that instead of from the kafka brokers?  I can't speak for
Jay, Jun or Neha, but I believe the expected usage of Kafka is essentially
as a buffering mechanism to take the edge off the natural ebb-n-flow of
unpredictable internet traffic.  The highly available, long-term storage of
data is probably not at the top of their list of use cases when making
design decisions.

On Thu, Feb 21, 2013 at 6:00 PM, Anthony Grimes <[EMAIL PROTECTED]> wrote:

> Our use case is that we'd like to log data we don't need away and
> potentially replay it at some point. We don't want to delete old logs. I
> googled around a bit and I only discovered this particular post:
> http://mail-archives.apache.**org/mod_mbox/incubator-kafka-**
> users/201210.mbox/%3CCAFbh0Q2=**eJcDT6NvTAPtxhXSk64x0Yms-G-**
> [EMAIL PROTECTED]l.**com%3E<http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201210.mbox/%[EMAIL PROTECTED]%3E>
> In summary, it appears the primary issue is that Kafka keeps file handles
> of each log segment open. Is there a way to configure this, or is a way to
> do so planned? It appears that an option to deduplicate instead of delete
> was added recently, so doesn't the file handle issue exist with that as
> well (since files aren't being deleted)?