Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Solution for blocking fsync in 0.8


+
Jay Kreps 2012-05-24, 17:40
+
Chris Burroughs 2012-06-19, 01:21
+
S Ahmed 2012-05-25, 14:09
+
Jay Kreps 2012-05-25, 17:22
Copy link to this message
-
Re: Solution for blocking fsync in 0.8
so 40ms for how many messages and what kind of payload?

And any idea how much data is blocked? (msgs/payload)

Even though 40ms doesn't seem like much, it is def. something that can
creep up in a high load environment, and something you can't really monitor
unless you have some sort of metrics built into the system.

Maybe have this built in: http://metrics.codahale.com/

On Fri, May 25, 2012 at 1:22 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> It depends a great deal on the hw and the flush interval. I think for our
> older generation hw we saw an avg flush time of 40ms, for newer stuff we
> just got it is much less but I think that might be because the disks
> themselves have some kind of nvram or something.
>
> -Jay
>
> On Fri, May 25, 2012 at 7:09 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
>
> > In practise (at linkedin), how long do you see the calls blocked for
> during
> > fsycs?
> >
> > On Thu, May 24, 2012 at 1:40 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > One issue with using the filesystem for persistence is that the
> > > synchronization in the filesystem is not great. In particular the fsync
> > and
> > > fsyncdata system calls block appends to the file, apparently for the
> > entire
> > > duration of the fsync (which can be quite long). This is documented in
> > some
> > > detail here:
> > >  http://antirez.com/post/fsync-different-thread-useless.html
> > >
> > > This is a problem in 0.7 because our definition of a committed message
> is
> > > one written prior to calling fsync(). This is the only way to guarantee
> > the
> > > message is on disk. We do not hand out any messages to consumers until
> an
> > > fsync call occurs. The problem is that regardless of whether the fsync
> is
> > > in a background thread or not it will block any produce requests to the
> > > file. This is buffered a bit in the client since our produce request is
> > > effectively async in 0.7, but it can lead to weird latency spikes
> > > nontheless as this buffering gets filled.
> > >
> > > In 0.8 with replication the definition of a committed message changes
> to
> > > one that is replicated to multiple machines, not necessarily committed
> to
> > > disk. This is a different kind of guarantee with different strengths
> and
> > > weaknesses (pro: data can survive destruction of the file system on one
> > > machine, con: you will lose a few messages if you haven't sync'd and
> the
> > > power goes out). We will likely retain the flush interval and time
> > settings
> > > for those who want fine grained control over flushing, but it is less
> > > relevant.
> > >
> > > Unfortunately *any* call to fsync will block appends even in a
> background
> > > thread so how can we give control over physical disk persistence
> without
> > > introducing high latency for the producer? The answer is that the linux
> > > pdflush daemon actually does a very similar thing to our flush
> > parameters.
> > > pdflush is a daemon running on every linux machine that controls the
> > > writing of buffered/cached data back to disk. It allows you to control
> > the
> > > percentage of memory filled with dirty pages by giving it either a
> > > percentage of memory, a time out for any dirty page to be written, or a
> > > fixed number of dirty bytes.
> > >
> > > The question is, does pdflush block appends? The answer seems to be
> > mostly
> > > no. It locks the page being flushed but not the whole file. The time to
> > > flush one page is actually usually pretty quick (plus I think it may
> not
> > be
> > > flushing just written pages anyway). I wrote some test code for this
> and
> > > here are the results:
> > >
> > > I modified the code from the link above. Here are the results from my
> > > desktop (Centos Linux 2.6.32).
> > >
> > > We run the test writing 1024 bytes every 100 us and flushing every 500
> > us:
> > >
> > > $ ./pdflush-test 1024 100 500
> > > 21
> > > 4
> > > 3
> > > 3
> > > 9
> > > 6
> > > Sync in 20277 us (0), sleeping for 500 us
> > > 19819
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB