Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> large amount of disk space freed on restart


Copy link to this message
-
Re: large amount of disk space freed on restart
Jay,

My only experience so far with this is using XFS.  It appears the XFS
behavior is evolving, and in fact, we see somewhat different behavior from
2 of our CentOS kernel versions in use.  I've been trying to ask questions
about all this on the XFS.org mailing list, but so far, having not much
luck understanding the xfs versioning correlated to CentOS versions.

Anyway, yes, I think it would definitely be worth trying the solution you
suggest, which would be to close the file on rotation, and re-open
read-only.  Or to close files after a few hours of not being accessed.   If
a patch for one of these approaches can be cobbled together, I'd love to
test it out on our staging environment.  I'd be willing to experiment with
such a patch myself, although I'm not 100% of all the places to look (but
might dive in).

Xfs appears to the option of using dynamic, speculative preallocation, in
which case it progressively doubles the amount of space reserved for a
file, as the file grows.  It does do this for all open files.  If the file
is closed, it will then release the preallocated space not in use.  It's
not clear whether this releasing of space happens immediately on close, and
whether re-opening the file read-only immediately, will keep it from
releasing space (still trying to gather more info on that).

I haven't looked too much at the index files, but those too appear to have
this behavior (e.g. preallocated size is always on the order of double the
actual size, until the app is restarted).

Jason
On Fri, Jul 26, 2013 at 12:46 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> Interesting.
>
> Yes, Kafka keeps all log files open indefinitely. There is no inherent
> reason this needs to be the case, though, it would be possible to LRU out
> old file descriptors and close them if they are not accessed for a few
> hours and then reopen on the first access. We just haven't implemented
> anything like that.
>
> It would be good to understand this a little better. Does xfs pre-allocate
> space for all open files? Perhaps just closing the file on log role and
> opening it read-only would solve the issue? Is this at all related to the
> use of sparse files for the indexes (i.e. RandomAccessFile.setLength(10MB)
> when we create the index)? Does this effect other filesystems or just xfs?
>
> -Jay
>
>
> On Fri, Jul 26, 2013 at 12:42 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
>
> > It looks like xfs will reclaim the preallocated space for a file, after
> it
> > is closed.
> >
> > Does kafka close a file after it has reached it's max size and started
> > writing to the next log file in sequence?  Or does it keep all open until
> > they are deleted, or the server quits (that's what it seems like).
> >
> > I could imagine that it might need to keep log files open, in order to
> > allow consumers access to them.  But does it keep them open indefinitely,
> > after there is no longer any data to be written to them, and no consumers
> > are currently attempting to read from them?
> >
> > Jason
> >
> >
> > On Tue, Jul 16, 2013 at 4:32 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > Interesting. Yes it will respect whatever setting it is given for new
> > > segments created from that point on.
> > >
> > > -Jay
> > >
> > >
> > > On Tue, Jul 16, 2013 at 11:23 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Ok,
> > > >
> > > > An update on this.  It seems we are using XFS, which is available in
> > > newer
> > > > versions of Centos.  It definitely does pre-allocate space as a file
> > > grows,
> > > > see:
> > > >
> > > >
> > >
> >
> http://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file
> > > >
> > > > Apparently it's not hard-allocated space, and would be released under
> > > > resource pressure....seems we may need to update how we monitor disk
> > > space
> > > > usage, etc....
> > > >
> > > > But, it seems that the default log file size of 1.1Gb, causes it to

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB