Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - large amount of disk space freed on restart


Copy link to this message
-
Re: large amount of disk space freed on restart
Jason Rosenberg 2013-07-14, 08:37
An update on this.  It appears that the phenomenon I'm seeing is that disk
space is freed on restart, but it's not due files getting deleted on
restart, but instead files are getting truncated on restart.  It appears
that log files get pre-allocated to a larger size than is used right away.
 Upon restart, they get truncated to the size of the file that actually
contains data.  Does this make sense?

Before restart, I see a large number of log files size 2.1Gb.  Upon
restart, the disk space reclaimed drops to almost half that, on average.

Thoughts?

Jason
On Thu, May 23, 2013 at 8:55 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> I haven't seen this issue before. We do have ~1K topics in one of the Kafka
> clusters at LinkedIn.
>
> Thanks,
>
> Jun
>
>
> On Thu, May 23, 2013 at 11:05 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
>
> > Yeah, that's what it looks like to me (looking at the code).  So, I'm
> > guessing it's some os level caching, resource recycling.  Have you ever
> > heard of this happening?  One thing that might be different in my usage
> > from the norm is a relatively large number of topics (e.g. ~2K topics).
> >
> > Jason
> >
> >
> > On Thu, May 23, 2013 at 7:14 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> > > Jason,
> > >
> > > Kafka closes the handler of all delete files. Otherwise, the broker
> will
> > > run out of file handler quickly.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, May 22, 2013 at 10:17 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > So, does this indicate kafka (or the jvm itself) is not aggressively
> > > > closing file handles of deleted files?  Is there a fix for this?  Or
> is
> > > > there not likely anything to be done?  What happens if the disk fills
> > up
> > > > with file handles for phantom deleted files?
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 9:50 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > It isn't uncommon if a process has an open file handle on a file
> that
> > > is
> > > > > deleted, the space is not freed until the handle is closed. So
> > > restarting
> > > > > the process that has a handle on the file would cause the space to
> be
> > > > freed
> > > > > also.
> > > > >
> > > > > You can troubleshoot that with lsof.
> > > > > Normally, I see 2-4 log segments deleted every hour in my brokers.
>  I
> > > see
> > > > > log lines like this:
> > > > >
> > > > > 2013-05-23 04:40:06,857  INFO [kafka-logcleaner-0] log.LogManager -
> > > > > Deleting log segment 00000000035434043157.kafka from <redacted
> topic>
> > > > >
> > > > > However, it seems like if I restart the broker, a massive amount of
> > > disk
> > > > > space is freed (without a corresponding flood of these log segment
> > > > deleted
> > > > > messages).  Is there an explanation for this?  Does kafka keep
> > > reference
> > > > to
> > > > > file segments around, and reuse them as needed or something?  And
> > then
> > > or
> > > > > restart, the references to those free segment files are dropped?
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > This is with 0.7.2.
> > > > >
> > > > > Jason
> > > > >
> > > >
> > >
> >
>