Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Solution for blocking fsync in 0.8


Copy link to this message
-
Re: Solution for blocking fsync in 0.8
S Ahmed 2012-05-26, 09:50
so 40ms for how many messages and what kind of payload?

And any idea how much data is blocked? (msgs/payload)

Even though 40ms doesn't seem like much, it is def. something that can
creep up in a high load environment, and something you can't really monitor
unless you have some sort of metrics built into the system.

Maybe have this built in: http://metrics.codahale.com/

On Fri, May 25, 2012 at 1:22 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> It depends a great deal on the hw and the flush interval. I think for our
> older generation hw we saw an avg flush time of 40ms, for newer stuff we
> just got it is much less but I think that might be because the disks
> themselves have some kind of nvram or something.
>
> -Jay
>
> On Fri, May 25, 2012 at 7:09 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
>
> > In practise (at linkedin), how long do you see the calls blocked for
> during
> > fsycs?
> >
> > On Thu, May 24, 2012 at 1:40 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > One issue with using the filesystem for persistence is that the
> > > synchronization in the filesystem is not great. In particular the fsync
> > and
> > > fsyncdata system calls block appends to the file, apparently for the
> > entire
> > > duration of the fsync (which can be quite long). This is documented in
> > some
> > > detail here:
> > >  http://antirez.com/post/fsync-different-thread-useless.html
> > >
> > > This is a problem in 0.7 because our definition of a committed message
> is
> > > one written prior to calling fsync(). This is the only way to guarantee
> > the
> > > message is on disk. We do not hand out any messages to consumers until
> an
> > > fsync call occurs. The problem is that regardless of whether the fsync
> is
> > > in a background thread or not it will block any produce requests to the
> > > file. This is buffered a bit in the client since our produce request is
> > > effectively async in 0.7, but it can lead to weird latency spikes
> > > nontheless as this buffering gets filled.
> > >
> > > In 0.8 with replication the definition of a committed message changes
> to
> > > one that is replicated to multiple machines, not necessarily committed
> to
> > > disk. This is a different kind of guarantee with different strengths
> and
> > > weaknesses (pro: data can survive destruction of the file system on one
> > > machine, con: you will lose a few messages if you haven't sync'd and
> the
> > > power goes out). We will likely retain the flush interval and time
> > settings
> > > for those who want fine grained control over flushing, but it is less
> > > relevant.
> > >
> > > Unfortunately *any* call to fsync will block appends even in a
> background
> > > thread so how can we give control over physical disk persistence
> without
> > > introducing high latency for the producer? The answer is that the linux
> > > pdflush daemon actually does a very similar thing to our flush
> > parameters.
> > > pdflush is a daemon running on every linux machine that controls the
> > > writing of buffered/cached data back to disk. It allows you to control
> > the
> > > percentage of memory filled with dirty pages by giving it either a
> > > percentage of memory, a time out for any dirty page to be written, or a
> > > fixed number of dirty bytes.
> > >
> > > The question is, does pdflush block appends? The answer seems to be
> > mostly
> > > no. It locks the page being flushed but not the whole file. The time to
> > > flush one page is actually usually pretty quick (plus I think it may
> not
> > be
> > > flushing just written pages anyway). I wrote some test code for this
> and
> > > here are the results:
> > >
> > > I modified the code from the link above. Here are the results from my
> > > desktop (Centos Linux 2.6.32).
> > >
> > > We run the test writing 1024 bytes every 100 us and flushing every 500
> > us:
> > >
> > > $ ./pdflush-test 1024 100 500
> > > 21
> > > 4
> > > 3
> > > 3
> > > 9
> > > 6
> > > Sync in 20277 us (0), sleeping for 500 us
> > > 19819