Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Keeping logs forever


+
Anthony Grimes 2013-02-22, 00:07
+
Eric Tschetter 2013-02-22, 00:30
+
Milind Parikh 2013-02-22, 00:44
Copy link to this message
-
Re: Keeping logs forever
Jay Kreps 2013-02-22, 01:26
You can do this and it should work fine. You would have to keep adding
machines to get disk capacity, of course, since your data set would
only grow.

We will keep an open file descriptor per file, but I think that is
okay. Just set the segment size to 1GB, then with 10TB of storage that
is only 10k files which should be fine. Adjust the OS open FD limit up
a bit if needed. File descriptors don't use too much memory so this
should not hurt anything.

-Jay

On Thu, Feb 21, 2013 at 4:00 PM, Anthony Grimes <[EMAIL PROTECTED]> wrote:
> Our use case is that we'd like to log data we don't need away and
> potentially replay it at some point. We don't want to delete old logs. I
> googled around a bit and I only discovered this particular post:
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201210.mbox/%[EMAIL PROTECTED]%3E
>
> In summary, it appears the primary issue is that Kafka keeps file handles of
> each log segment open. Is there a way to configure this, or is a way to do
> so planned? It appears that an option to deduplicate instead of delete was
> added recently, so doesn't the file handle issue exist with that as well
> (since files aren't being deleted)?

 
+
graham sanderson 2013-02-22, 02:47
+
Eric Tschetter 2013-02-22, 21:39
+
Jay Kreps 2013-02-23, 04:13
+
Anthony Grimes 2013-02-22, 05:33