|
|
Bob Cotton 2012-11-08, 01:55
Hello,
We have a low-volume topic (~75msgs/sec) for which we would like to have a low propagation delay from producer to consumer.
We have 3 brokers, each with a default of 4 partitions each. for a total of 12 partitions. The producer is sync, without compression. There are 8 producers each producing 1/8 of the traffic. We are using the high-level java consumer, with 4 threads consuming the topic.
We are wrapping the message with a custom Encoder/Decoder and record currentTimeMillis() on the sender, and do the same in the receiver, then record the propagation delay. All hosts are time synced with ntp.
With the settings on the broker for flush messages and flush interval (unset, defaults to 500 msgs and 3000ms) the overall 95th percentile for propagation is 2,500ms.
When we adjust the topic flush interval to 20ms, the 95th percentile drops to 1,700ms When we adjust the consumers "fetcher.backoff.ms" to 10, the 95th percentile drops to about 970ms.
We would like this to be sub-500ms. We could run with less partitions and/or more consumer threads.
Anything glaring about this config? anything we're missing?
Thanks -Bob
Neha Narkhede 2012-11-08, 02:58
Bob,
The latency that you are seeing seems a little high. I would suspect at your low write rate, the # of partitions is too high. The simplest test for latency could be one producer, one broker partition and one consumer. It will be great if you can give that a try.
Thanks, Neha
On Wed, Nov 7, 2012 at 5:55 PM, Bob Cotton <[EMAIL PROTECTED]> wrote: > Hello, > > We have a low-volume topic (~75msgs/sec) for which we would like to have a > low propagation delay from producer to consumer. > > We have 3 brokers, each with a default of 4 partitions each. for a total of > 12 partitions. > The producer is sync, without compression. There are 8 producers each > producing 1/8 of the traffic. > We are using the high-level java consumer, with 4 threads consuming the > topic. > > We are wrapping the message with a custom Encoder/Decoder and record > currentTimeMillis() on the sender, and do the same in the receiver, then > record the propagation delay. All hosts are time synced with ntp. > > With the settings on the broker for flush messages and flush interval > (unset, defaults to 500 msgs and 3000ms) the overall 95th percentile for > propagation is 2,500ms. > > When we adjust the topic flush interval to 20ms, the 95th percentile drops > to 1,700ms > When we adjust the consumers "fetcher.backoff.ms" to 10, the 95th > percentile drops to about 970ms. > > We would like this to be sub-500ms. > We could run with less partitions and/or more consumer threads. > > Anything glaring about this config? anything we're missing? > > Thanks > -Bob
Jay Kreps 2012-11-08, 15:57
Hi Bob,
Currently the broker does not hand out messages to consumers until they are flushed to disk, this means the flush interval acts as a lower bound on worst case latency. Setting that lower should fix the problem.
This problem has been eliminated in the next release, as both the blocking on flush and the fetcher backoff have been eliminated--this should drop latency to a few ms.
-Jay On Wed, Nov 7, 2012 at 5:55 PM, Bob Cotton <[EMAIL PROTECTED]> wrote:
> Hello, > > We have a low-volume topic (~75msgs/sec) for which we would like to have a > low propagation delay from producer to consumer. > > We have 3 brokers, each with a default of 4 partitions each. for a total of > 12 partitions. > The producer is sync, without compression. There are 8 producers each > producing 1/8 of the traffic. > We are using the high-level java consumer, with 4 threads consuming the > topic. > > We are wrapping the message with a custom Encoder/Decoder and record > currentTimeMillis() on the sender, and do the same in the receiver, then > record the propagation delay. All hosts are time synced with ntp. > > With the settings on the broker for flush messages and flush interval > (unset, defaults to 500 msgs and 3000ms) the overall 95th percentile for > propagation is 2,500ms. > > When we adjust the topic flush interval to 20ms, the 95th percentile drops > to 1,700ms > When we adjust the consumers "fetcher.backoff.ms" to 10, the 95th > percentile drops to about 970ms. > > We would like this to be sub-500ms. > We could run with less partitions and/or more consumer threads. > > Anything glaring about this config? anything we're missing? > > Thanks > -Bob >
Jay Kreps 2012-11-08, 15:57
Oops, missed what you said--that you had already dropped the flush interval. Listen to Neha :-)
-Jay On Thu, Nov 8, 2012 at 7:57 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> Hi Bob, > > Currently the broker does not hand out messages to consumers until they > are flushed to disk, this means the flush interval acts as a lower bound on > worst case latency. Setting that lower should fix the problem. > > This problem has been eliminated in the next release, as both the blocking > on flush and the fetcher backoff have been eliminated--this should drop > latency to a few ms. > > -Jay > > > On Wed, Nov 7, 2012 at 5:55 PM, Bob Cotton <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> We have a low-volume topic (~75msgs/sec) for which we would like to have a >> low propagation delay from producer to consumer. >> >> We have 3 brokers, each with a default of 4 partitions each. for a total >> of >> 12 partitions. >> The producer is sync, without compression. There are 8 producers each >> producing 1/8 of the traffic. >> We are using the high-level java consumer, with 4 threads consuming the >> topic. >> >> We are wrapping the message with a custom Encoder/Decoder and record >> currentTimeMillis() on the sender, and do the same in the receiver, then >> record the propagation delay. All hosts are time synced with ntp. >> >> With the settings on the broker for flush messages and flush interval >> (unset, defaults to 500 msgs and 3000ms) the overall 95th percentile for >> propagation is 2,500ms. >> >> When we adjust the topic flush interval to 20ms, the 95th percentile drops >> to 1,700ms >> When we adjust the consumers "fetcher.backoff.ms" to 10, the 95th >> percentile drops to about 970ms. >> >> We would like this to be sub-500ms. >> We could run with less partitions and/or more consumer threads. >> >> Anything glaring about this config? anything we're missing? >> >> Thanks >> -Bob >> > >
Bob Cotton 2012-11-08, 22:29
Found the problem. Now we are sub 400ms.
While I was setting the per-topic flush interval, I failed to set the scheduler interval via log.default.flush.scheduler.interval.ms
I was going to open a JIRA to have this auto-set using the minimum of any custom flush intervals, but it seems that its moot in 0.8.
Thanks for the help!
- Bob On Thu, Nov 8, 2012 at 8:57 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> Oops, missed what you said--that you had already dropped the flush > interval. Listen to Neha :-) > > -Jay > > > On Thu, Nov 8, 2012 at 7:57 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > >> Hi Bob, >> >> Currently the broker does not hand out messages to consumers until they >> are flushed to disk, this means the flush interval acts as a lower bound on >> worst case latency. Setting that lower should fix the problem. >> >> This problem has been eliminated in the next release, as both the >> blocking on flush and the fetcher backoff have been eliminated--this should >> drop latency to a few ms. >> >> -Jay >> >> >> On Wed, Nov 7, 2012 at 5:55 PM, Bob Cotton <[EMAIL PROTECTED]> wrote: >> >>> Hello, >>> >>> We have a low-volume topic (~75msgs/sec) for which we would like to have >>> a >>> low propagation delay from producer to consumer. >>> >>> We have 3 brokers, each with a default of 4 partitions each. for a total >>> of >>> 12 partitions. >>> The producer is sync, without compression. There are 8 producers each >>> producing 1/8 of the traffic. >>> We are using the high-level java consumer, with 4 threads consuming the >>> topic. >>> >>> We are wrapping the message with a custom Encoder/Decoder and record >>> currentTimeMillis() on the sender, and do the same in the receiver, then >>> record the propagation delay. All hosts are time synced with ntp. >>> >>> With the settings on the broker for flush messages and flush interval >>> (unset, defaults to 500 msgs and 3000ms) the overall 95th percentile for >>> propagation is 2,500ms. >>> >>> When we adjust the topic flush interval to 20ms, the 95th percentile >>> drops >>> to 1,700ms >>> When we adjust the consumers "fetcher.backoff.ms" to 10, the 95th >>> percentile drops to about 970ms. >>> >>> We would like this to be sub-500ms. >>> We could run with less partitions and/or more consumer threads. >>> >>> Anything glaring about this config? anything we're missing? >>> >>> Thanks >>> -Bob >>> >> >> >
|
|