Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Broker crashes when no space left for log.dirs


+
Bryan Baugher 2013-08-14, 16:52
+
Joel Koshy 2013-08-15, 00:06
+
Jay Kreps 2013-08-15, 02:59
+
Jason Rosenberg 2013-08-15, 17:20
+
Jay Kreps 2013-08-15, 17:58
Copy link to this message
-
Re: Broker crashes when no space left for log.dirs
Hmmm....I guess I was thinking that a broker could receive a message and
keep it in memory, before having disk space reserved for it's eventual
storage.  Are you saying that memory is not allocated for a message without
there already being disk space allocated for it?  In which case, there
should be no problem!

Jason
On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> I don't think the filesystem will overcommit its disk space, but I'm
> actually not sure. I think this would only come into play on a fs like ext4
> which does lazy block allocation in addition to lazy writing. But I think
> even ext4 is probably not allowed to hand out more disk space then it has.
>
>
> On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
>
> > A related question:  Will producers sending messages with acknowledgment,
> > get a failed ack if a broker is out of disk space, or will messages get
> > buffered in memory successfully (resulting in a good ack, before failing
> to
> > be written).
> >
> > It seems like it might be a good feature to have the broker auto-detect
> if
> > it's log dir is nearing full, so that there is some runway to gracefully
> > shutdown, while still writing any in memory buffered messages.  It could
> be
> > an optional threshold, like 98% full, or X Mb free, etc.
> >
> > Jason
> >
> >
> > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > The crash is actually just a call to shutdown. We think this is the
> right
> > > thing to do, though I agree it is unintuitive. Here is why. When you
> get
> > an
> > > out of space error it is likely that the operating system did a partial
> > > write, leaving you with a corrupt log. Furthermore it is possible that
> > > space will free up at which point more writes on the log could succeed
> so
> > > you wouldn't even know there was a problem but all your consumers would
> > hit
> > > this data and choke.
> > >
> > > By "crashing" the node we ensure that recovery is run on the log to
> bring
> > > it into a consistent state.
> > >
> > > Theoretically we could leave the node up accepting reads but rejecting
> > > writes while attempting to recover the log. But there are a bunch of
> > > problems with this. But this is very complex. Likely if you are out of
> > > space you are just going to keep getting writes, and running out of
> space
> > > again and then running recovery and so on. This kind of crazy loop is
> > much
> > > worse then just needing to bring the node back up.
> > >
> > > Alternately we could leave the node up but go into some kind of
> > > write-rejecting mode forever. But this would still require that you
> > restart
> > > the node, and we would have to implement that write-rejecting node.
> > >
> > > Cheers,
> > >
> > > -Jay
> > >
> > >
> > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > This is more of a thought question than a problem that I need support
> > > for.
> > > > I have trying out Kafka 0.8.0-beta1 with replication. For our user
> case
> > > we
> > > > want to try and guarantee that our consumers will see all messages
> even
> > > if
> > > > they have fallen greatly behind the broker/producer. For this reason
> I
> > > > wanted to know how the broker would react when the filesystem it
> writes
> > > its
> > > > messages to is full. What I found was that the broker crashes and
> > cannot
> > > be
> > > > started until the filesystem has space again.
> > > >
> > > > Is there or would it make sense to provide configuration allowing the
> > > > broker to reject writes in this case rather than crashing, electing a
> > new
> > > > leader and attempting the write again? I can clearly understand the
> use
> > > > case that we don't want to 'lose' messages from the producer and I
> > could
> > > > also see how lack of filesystem space could be considered a machine
> > > > failure, but with replication I would think if you are running out of

 
+
Jay Kreps 2013-08-15, 18:31
+
Jason Rosenberg 2013-08-16, 20:47
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB