Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Broker crashes when no space left for log.dirs


Copy link to this message
-
Re: Broker crashes when no space left for log.dirs
I am saying we always immediately write to the fs. So the question is is it
possible with delayed allocation in ext4 to do a successful write that
later cannot be flushed to disk due to running out of space? I don't know
the answer to this, though I would hope it is not possible.

Basically if our write to the fs succeeds and replicas acknowledge then we
send back the ack.

-Jay
On Thu, Aug 15, 2013 at 11:12 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Hmmm....I guess I was thinking that a broker could receive a message and
> keep it in memory, before having disk space reserved for it's eventual
> storage.  Are you saying that memory is not allocated for a message without
> there already being disk space allocated for it?  In which case, there
> should be no problem!
>
> Jason
>
>
> On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > I don't think the filesystem will overcommit its disk space, but I'm
> > actually not sure. I think this would only come into play on a fs like
> ext4
> > which does lazy block allocation in addition to lazy writing. But I think
> > even ext4 is probably not allowed to hand out more disk space then it
> has.
> >
> >
> > On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> > wrote:
> >
> > > A related question:  Will producers sending messages with
> acknowledgment,
> > > get a failed ack if a broker is out of disk space, or will messages get
> > > buffered in memory successfully (resulting in a good ack, before
> failing
> > to
> > > be written).
> > >
> > > It seems like it might be a good feature to have the broker auto-detect
> > if
> > > it's log dir is nearing full, so that there is some runway to
> gracefully
> > > shutdown, while still writing any in memory buffered messages.  It
> could
> > be
> > > an optional threshold, like 98% full, or X Mb free, etc.
> > >
> > > Jason
> > >
> > >
> > > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > The crash is actually just a call to shutdown. We think this is the
> > right
> > > > thing to do, though I agree it is unintuitive. Here is why. When you
> > get
> > > an
> > > > out of space error it is likely that the operating system did a
> partial
> > > > write, leaving you with a corrupt log. Furthermore it is possible
> that
> > > > space will free up at which point more writes on the log could
> succeed
> > so
> > > > you wouldn't even know there was a problem but all your consumers
> would
> > > hit
> > > > this data and choke.
> > > >
> > > > By "crashing" the node we ensure that recovery is run on the log to
> > bring
> > > > it into a consistent state.
> > > >
> > > > Theoretically we could leave the node up accepting reads but
> rejecting
> > > > writes while attempting to recover the log. But there are a bunch of
> > > > problems with this. But this is very complex. Likely if you are out
> of
> > > > space you are just going to keep getting writes, and running out of
> > space
> > > > again and then running recovery and so on. This kind of crazy loop is
> > > much
> > > > worse then just needing to bring the node back up.
> > > >
> > > > Alternately we could leave the node up but go into some kind of
> > > > write-rejecting mode forever. But this would still require that you
> > > restart
> > > > the node, and we would have to implement that write-rejecting node.
> > > >
> > > > Cheers,
> > > >
> > > > -Jay
> > > >
> > > >
> > > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > This is more of a thought question than a problem that I need
> support
> > > > for.
> > > > > I have trying out Kafka 0.8.0-beta1 with replication. For our user
> > case
> > > > we
> > > > > want to try and guarantee that our consumers will see all messages
> > even
> > > > if
> > > > > they have fallen greatly behind the broker/producer. For this
> reason
> > I
> > > > > wanted to know how the broker would react when the filesystem it

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB