Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Keeping logs forever


Copy link to this message
-
Re: Keeping logs forever
Forever is a long time. The definition of replay and navigating through
different versions of kafka would be key.

Example:
If you are storing market data into kafka and have a cep engine running on
top and would like replay "transactions" to be fed back to ensure
replayability, then you would probably want to manage that through the same
mechanism as it existed at that time in the past. This might mean a
different kafka broker (perhaps 0.7) with a different set of consumers with
a potentially different JVM. This, of course, gets into a rat hole.

Regards
Milind
On Thu, Feb 21, 2013 at 4:29 PM, Eric Tschetter <[EMAIL PROTECTED]>wrote:

> Anthony,
>
> Is there a reason you wouldn't want to just push the data into something
> built for cheap, long-term storage (like glacier, S3, or HDFS) and perhaps
> "replay" from that instead of from the kafka brokers?  I can't speak for
> Jay, Jun or Neha, but I believe the expected usage of Kafka is essentially
> as a buffering mechanism to take the edge off the natural ebb-n-flow of
> unpredictable internet traffic.  The highly available, long-term storage of
> data is probably not at the top of their list of use cases when making
> design decisions.
>
> --Eric
>
>
> On Thu, Feb 21, 2013 at 6:00 PM, Anthony Grimes <[EMAIL PROTECTED]> wrote:
>
> > Our use case is that we'd like to log data we don't need away and
> > potentially replay it at some point. We don't want to delete old logs. I
> > googled around a bit and I only discovered this particular post:
> > http://mail-archives.apache.**org/mod_mbox/incubator-kafka-**
> > users/201210.mbox/%3CCAFbh0Q2=**eJcDT6NvTAPtxhXSk64x0Yms-G-**
> > [EMAIL PROTECTED]l.**com%3E<
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201210.mbox/%[EMAIL PROTECTED]%3E
> >
> >
> > In summary, it appears the primary issue is that Kafka keeps file handles
> > of each log segment open. Is there a way to configure this, or is a way
> to
> > do so planned? It appears that an option to deduplicate instead of delete
> > was added recently, so doesn't the file handle issue exist with that as
> > well (since files aren't being deleted)?
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB