Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - large amount of disk space freed on restart


+
Jason Rosenberg 2013-05-23, 04:46
+
Jonathan Creasy 2013-05-23, 04:51
+
Jason Rosenberg 2013-05-23, 05:17
+
Jonathan Creasy 2013-05-23, 05:25
+
Jason Rosenberg 2013-05-23, 05:49
+
Jun Rao 2013-05-23, 14:15
+
Jason Rosenberg 2013-05-23, 18:06
+
Jun Rao 2013-05-24, 03:56
+
Jason Rosenberg 2013-07-14, 08:37
+
Jay Kreps 2013-07-14, 16:45
+
Jason Rosenberg 2013-07-16, 18:23
+
Jay Kreps 2013-07-16, 20:32
+
Jason Rosenberg 2013-07-26, 07:43
+
Jay Kreps 2013-07-26, 16:46
+
Jason Rosenberg 2013-07-26, 21:00
+
Jay Kreps 2013-07-26, 21:03
+
Mike Heffner 2013-09-09, 15:18
+
Jay Kreps 2013-09-09, 17:48
Copy link to this message
-
Re: large amount of disk space freed on restart
Jason Rosenberg 2013-09-09, 18:41
Sorry, I forgot to close the loop on my experiences with this....

We solved this issue by setting the 'allocsize' mount option, in the fstab.
 E.g. allocsize=16M.

Jason
On Mon, Sep 9, 2013 at 1:47 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> This could certainly be done. It would be slightly involved since you would
> need to implement some kind of file-handle cache for both indexes and log
> files and re-open them on demand when a read occurs. If someone wants to
> take a shot at this, the first step would be to get a design wiki in place
> on how this would work. This is potentially nice to reduce the open file
> count (though open files are pretty cheap).
>
> That said this issue only impacts xfs and it seems to be fixed by that
> setting jonathan found. I wonder if you could give that a try and see if it
> works for you too? I feel dealing with closed files does add a lot of
> complexity so if there is an easy fix I would probably rather avoid it.
>
> -Jay
>
>
> On Mon, Sep 9, 2013 at 8:17 AM, Mike Heffner <[EMAIL PROTECTED]> wrote:
>
> > We are also seeing this problem with version 0.7.1 and logs on an XFS
> > partition. At our largest scale we can frequently free over 600GB of disk
> > usage by simply restarting Kafka. We've examined the `lsof` output from
> the
> > Kafka process and while it does appear to have FDs open for all log files
> > on disk (even those long past read from), it does not have any files open
> > that were previously deleted from disk.
> >
> > Du output agrees that the seen size is much larger than apparent-size
> size:
> >
> > root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h
> > 00000000242666442619.kafka
> > 1.1G 00000000242666442619.kafka
> > root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h --apparent-size
> > 00000000242666442619.kafka
> > 513M 00000000242666442619.kafka
> >
> >
> > Our log size/retention policy is:
> >
> > log.file.size=536870912
> > log.retention.hours=96
> >
> > We tried dropping the caches from the Stack Overflow suggestion (sync;
> echo
> > 3 > /proc/sys/vm/drop_caches) but that didn't seem to clear up the extra
> > space. Haven't had the chance to try remounting with the allocsize
> option.
> >
> > In summary, it would be great if Kafka would close FD's to log files that
> > hadn't been read from for some period of time if it addresses this issue.
> >
> > Cheers,
> >
> > Mike
> >
> > On Fri, Jul 26, 2013 at 5:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > Cool, good to know.
> > >
> > >
> > > On Fri, Jul 26, 2013 at 2:00 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Jay,
> > > >
> > > > My only experience so far with this is using XFS.  It appears the XFS
> > > > behavior is evolving, and in fact, we see somewhat different behavior
> > > from
> > > > 2 of our CentOS kernel versions in use.  I've been trying to ask
> > > questions
> > > > about all this on the XFS.org mailing list, but so far, having not
> much
> > > > luck understanding the xfs versioning correlated to CentOS versions.
> > > >
> > > > Anyway, yes, I think it would definitely be worth trying the solution
> > you
> > > > suggest, which would be to close the file on rotation, and re-open
> > > > read-only.  Or to close files after a few hours of not being
> accessed.
> > > If
> > > > a patch for one of these approaches can be cobbled together, I'd love
> > to
> > > > test it out on our staging environment.  I'd be willing to experiment
> > > with
> > > > such a patch myself, although I'm not 100% of all the places to look
> > (but
> > > > might dive in).
> > > >
> > > > Xfs appears to the option of using dynamic, speculative
> preallocation,
> > in
> > > > which case it progressively doubles the amount of space reserved for
> a
> > > > file, as the file grows.  It does do this for all open files.  If the
> > > file
> > > > is closed, it will then release the preallocated space not in use.
> >  It's
> > > > not clear whether this releasing of space happens immediately on

 
+
Mike Heffner 2013-09-09, 21:07