Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Broker crashes when no space left for log.dirs


Copy link to this message
-
Re: Broker crashes when no space left for log.dirs
Jason Rosenberg 2013-08-16, 20:47
Ok,

I didn't realize the write to disk was immediate (is that new in 0.8, with
requested acks enabled?).

I do think the OS will indeed reserve space in advance for data not yet
flushed to disk.  This seems to be true, at least, for xfs, which I have
more experience lately.

Jason
On Thu, Aug 15, 2013 at 11:30 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> I am saying we always immediately write to the fs. So the question is is it
> possible with delayed allocation in ext4 to do a successful write that
> later cannot be flushed to disk due to running out of space? I don't know
> the answer to this, though I would hope it is not possible.
>
> Basically if our write to the fs succeeds and replicas acknowledge then we
> send back the ack.
>
> -Jay
>
>
> On Thu, Aug 15, 2013 at 11:12 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
>
> > Hmmm....I guess I was thinking that a broker could receive a message and
> > keep it in memory, before having disk space reserved for it's eventual
> > storage.  Are you saying that memory is not allocated for a message
> without
> > there already being disk space allocated for it?  In which case, there
> > should be no problem!
> >
> > Jason
> >
> >
> > On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > I don't think the filesystem will overcommit its disk space, but I'm
> > > actually not sure. I think this would only come into play on a fs like
> > ext4
> > > which does lazy block allocation in addition to lazy writing. But I
> think
> > > even ext4 is probably not allowed to hand out more disk space then it
> > has.
> > >
> > >
> > > On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > A related question:  Will producers sending messages with
> > acknowledgment,
> > > > get a failed ack if a broker is out of disk space, or will messages
> get
> > > > buffered in memory successfully (resulting in a good ack, before
> > failing
> > > to
> > > > be written).
> > > >
> > > > It seems like it might be a good feature to have the broker
> auto-detect
> > > if
> > > > it's log dir is nearing full, so that there is some runway to
> > gracefully
> > > > shutdown, while still writing any in memory buffered messages.  It
> > could
> > > be
> > > > an optional threshold, like 98% full, or X Mb free, etc.
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > The crash is actually just a call to shutdown. We think this is the
> > > right
> > > > > thing to do, though I agree it is unintuitive. Here is why. When
> you
> > > get
> > > > an
> > > > > out of space error it is likely that the operating system did a
> > partial
> > > > > write, leaving you with a corrupt log. Furthermore it is possible
> > that
> > > > > space will free up at which point more writes on the log could
> > succeed
> > > so
> > > > > you wouldn't even know there was a problem but all your consumers
> > would
> > > > hit
> > > > > this data and choke.
> > > > >
> > > > > By "crashing" the node we ensure that recovery is run on the log to
> > > bring
> > > > > it into a consistent state.
> > > > >
> > > > > Theoretically we could leave the node up accepting reads but
> > rejecting
> > > > > writes while attempting to recover the log. But there are a bunch
> of
> > > > > problems with this. But this is very complex. Likely if you are out
> > of
> > > > > space you are just going to keep getting writes, and running out of
> > > space
> > > > > again and then running recovery and so on. This kind of crazy loop
> is
> > > > much
> > > > > worse then just needing to bring the node back up.
> > > > >
> > > > > Alternately we could leave the node up but go into some kind of
> > > > > write-rejecting mode forever. But this would still require that you
> > > > restart
> > > > > the node, and we would have to implement that write-rejecting node.
> > > > >
> > > > > Cheers,
> > > > >
> >