Found the problem. Now we are sub 400ms.
While I was setting the per-topic flush interval, I failed to set the
scheduler interval via log.default.flush.scheduler.interval.ms
I was going to open a JIRA to have this auto-set using the minimum of any
custom flush intervals, but it seems that its moot in 0.8.
Thanks for the help!
On Thu, Nov 8, 2012 at 8:57 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> Oops, missed what you said--that you had already dropped the flush
> interval. Listen to Neha :-)
> On Thu, Nov 8, 2012 at 7:57 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>> Hi Bob,
>> Currently the broker does not hand out messages to consumers until they
>> are flushed to disk, this means the flush interval acts as a lower bound on
>> worst case latency. Setting that lower should fix the problem.
>> This problem has been eliminated in the next release, as both the
>> blocking on flush and the fetcher backoff have been eliminated--this should
>> drop latency to a few ms.
>> On Wed, Nov 7, 2012 at 5:55 PM, Bob Cotton <[EMAIL PROTECTED]> wrote:
>>> We have a low-volume topic (~75msgs/sec) for which we would like to have
>>> low propagation delay from producer to consumer.
>>> We have 3 brokers, each with a default of 4 partitions each. for a total
>>> 12 partitions.
>>> The producer is sync, without compression. There are 8 producers each
>>> producing 1/8 of the traffic.
>>> We are using the high-level java consumer, with 4 threads consuming the
>>> We are wrapping the message with a custom Encoder/Decoder and record
>>> currentTimeMillis() on the sender, and do the same in the receiver, then
>>> record the propagation delay. All hosts are time synced with ntp.
>>> With the settings on the broker for flush messages and flush interval
>>> (unset, defaults to 500 msgs and 3000ms) the overall 95th percentile for
>>> propagation is 2,500ms.
>>> When we adjust the topic flush interval to 20ms, the 95th percentile
>>> to 1,700ms
>>> When we adjust the consumers "fetcher.backoff.ms" to 10, the 95th
>>> percentile drops to about 970ms.
>>> We would like this to be sub-500ms.
>>> We could run with less partitions and/or more consumer threads.
>>> Anything glaring about this config? anything we're missing?