Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> large amount of disk space freed on restart


+
Jason Rosenberg 2013-05-23, 04:46
+
Jonathan Creasy 2013-05-23, 04:51
+
Jason Rosenberg 2013-05-23, 05:17
+
Jonathan Creasy 2013-05-23, 05:25
+
Jason Rosenberg 2013-05-23, 05:49
+
Jun Rao 2013-05-23, 14:15
+
Jason Rosenberg 2013-05-23, 18:06
+
Jun Rao 2013-05-24, 03:56
+
Jason Rosenberg 2013-07-14, 08:37
+
Jay Kreps 2013-07-14, 16:45
+
Jason Rosenberg 2013-07-16, 18:23
+
Jay Kreps 2013-07-16, 20:32
+
Jason Rosenberg 2013-07-26, 07:43
+
Jay Kreps 2013-07-26, 16:46
+
Jason Rosenberg 2013-07-26, 21:00
+
Jay Kreps 2013-07-26, 21:03
+
Mike Heffner 2013-09-09, 15:18
Copy link to this message
-
Re: large amount of disk space freed on restart
This could certainly be done. It would be slightly involved since you would
need to implement some kind of file-handle cache for both indexes and log
files and re-open them on demand when a read occurs. If someone wants to
take a shot at this, the first step would be to get a design wiki in place
on how this would work. This is potentially nice to reduce the open file
count (though open files are pretty cheap).

That said this issue only impacts xfs and it seems to be fixed by that
setting jonathan found. I wonder if you could give that a try and see if it
works for you too? I feel dealing with closed files does add a lot of
complexity so if there is an easy fix I would probably rather avoid it.

-Jay
On Mon, Sep 9, 2013 at 8:17 AM, Mike Heffner <[EMAIL PROTECTED]> wrote:

> We are also seeing this problem with version 0.7.1 and logs on an XFS
> partition. At our largest scale we can frequently free over 600GB of disk
> usage by simply restarting Kafka. We've examined the `lsof` output from the
> Kafka process and while it does appear to have FDs open for all log files
> on disk (even those long past read from), it does not have any files open
> that were previously deleted from disk.
>
> Du output agrees that the seen size is much larger than apparent-size size:
>
> root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h
> 00000000242666442619.kafka
> 1.1G 00000000242666442619.kafka
> root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h --apparent-size
> 00000000242666442619.kafka
> 513M 00000000242666442619.kafka
>
>
> Our log size/retention policy is:
>
> log.file.size=536870912
> log.retention.hours=96
>
> We tried dropping the caches from the Stack Overflow suggestion (sync; echo
> 3 > /proc/sys/vm/drop_caches) but that didn't seem to clear up the extra
> space. Haven't had the chance to try remounting with the allocsize option.
>
> In summary, it would be great if Kafka would close FD's to log files that
> hadn't been read from for some period of time if it addresses this issue.
>
> Cheers,
>
> Mike
>
> On Fri, Jul 26, 2013 at 5:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > Cool, good to know.
> >
> >
> > On Fri, Jul 26, 2013 at 2:00 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> > > Jay,
> > >
> > > My only experience so far with this is using XFS.  It appears the XFS
> > > behavior is evolving, and in fact, we see somewhat different behavior
> > from
> > > 2 of our CentOS kernel versions in use.  I've been trying to ask
> > questions
> > > about all this on the XFS.org mailing list, but so far, having not much
> > > luck understanding the xfs versioning correlated to CentOS versions.
> > >
> > > Anyway, yes, I think it would definitely be worth trying the solution
> you
> > > suggest, which would be to close the file on rotation, and re-open
> > > read-only.  Or to close files after a few hours of not being accessed.
> > If
> > > a patch for one of these approaches can be cobbled together, I'd love
> to
> > > test it out on our staging environment.  I'd be willing to experiment
> > with
> > > such a patch myself, although I'm not 100% of all the places to look
> (but
> > > might dive in).
> > >
> > > Xfs appears to the option of using dynamic, speculative preallocation,
> in
> > > which case it progressively doubles the amount of space reserved for a
> > > file, as the file grows.  It does do this for all open files.  If the
> > file
> > > is closed, it will then release the preallocated space not in use.
>  It's
> > > not clear whether this releasing of space happens immediately on close,
> > and
> > > whether re-opening the file read-only immediately, will keep it from
> > > releasing space (still trying to gather more info on that).
> > >
> > > I haven't looked too much at the index files, but those too appear to
> > have
> > > this behavior (e.g. preallocated size is always on the order of double
> > the
> > > actual size, until the app is restarted).
> > >
> > > Jason
> > >
> > >
> > > On Fri, Jul 26, 2013 at 12:46 PM, Jay Kreps <[EMAIL PROTECTED]>

 
+
Jason Rosenberg 2013-09-09, 18:41
+
Mike Heffner 2013-09-09, 21:07