Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - race condition with log flush interval settings...


Copy link to this message
-
Re: race condition with log flush interval settings...
Jason Rosenberg 2013-03-28, 19:05
Filed:  https://issues.apache.org/jira/browse/KAFKA-839

By the way, the jira queue needs to be updated to know that 0.7.2 is now a
released version, etc.

Jason
On Thu, Mar 28, 2013 at 7:58 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Jason,
>
> Could you file a jira so that we can track it?
>
> Thanks,
>
> Jun
>
> On Thu, Mar 28, 2013 at 12:43 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
>
> > It looks like there is a race condition between the settings for the 2
> > properties:  log.default.flush.scheduler.interval.ms &
> > log.default.flush.interval.ms.  I'm using 0.7.2.
> >
> > By default, both of these get set to 3000ms (and in the docs, it
> > recommends setting flushInterval to be a multiple of the
> > flushSchedulerInterval).
> >
> > However, the code in LogManager.flushAllLogs (which is scheduled to
> > run at a fixed rate using the flushSchedulerInterval property) looks
> > like this:
> >
> >         val timeSinceLastFlush = System.currentTimeMillis -
> > log.getLastFlushedTime
> >         var logFlushInterval = config.defaultFlushIntervalMs
> >         ....
> >         ....
> >         if(timeSinceLastFlush >= logFlushInterval)
> >           log.flush
> >
> > So, it will only flush logs if the the time since the last flush is
> > longer than the flush interval.   But, the log.lastFlushedTime is not
> > set until after flushing is completed (which can incur some io time).
> > Thus, by enabling TRACE logging for this method, I was able to see
> > that with the defaults, timeSinceLastFlush was usually about 2998
> > (which is less than the logFlushInterval of 3000).  Thus, setting a
> > flushInterval the same as the scheduler.flushInterval essentially
> > devolves to an effective flushInterval = 2X the
> > schedulerFlushInterval.
> >
> > So, setting a flushIinterval slightly less than the
> > flushSchedulerInterval (e.g. 2500) will guarantee that the flush will
> > happen on each scheduler invocation.
> >
> > I'm guessing that it might make sense to change the logic gating the
> > flush to something like:
> >
> >       if(timeSinceLastFlush >= 0.90 * logFlushInterval)
> >
> > might be reasonable.  Also, the scheduler probably ought to use a
> > 'fixedDelay' rather than a 'fixedRate' schedule.....
> >
> > Jason
> >
>