Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Consumer throughput imbalance


Copy link to this message
-
Re: Consumer throughput imbalance
Got it, thanks Jay

--
Ian Friedman
On Monday, August 26, 2013 at 2:37 PM, Jay Kreps wrote:

> Yes exactly.
>
> Lowering queuedchunks.max shouldn't help if the problem is what I
> described. That options controls how many chunks the consumer has ready in
> memory for processing. But we are hypothesisizing that your problem is
> actually that the individual chunks are just too large leading to the
> consumer spending a long time processing from one partition before it gets
> the next chunk.
>
> -Jay
>
>
> On Mon, Aug 26, 2013 at 11:18 AM, Ian Friedman <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
>
> > Just to make sure i have this right, on the producer side we'd set
> > max.message.size and then on the consumer side we'd set fetch.size? I
> > admittedly didn't research how all the tuning options would affect us,
> > thank you for the info. Would queuedchunks.max have any effect?
> >
> > --
> > Ian Friedman
> >
> >
> > On Monday, August 26, 2013 at 1:26 PM, Jay Kreps wrote:
> >
> > > Yeah it is always equal to the fetch size. The fetch size needs to be at
> > > least equal to the max message size you have allowed on the server,
> > >
> >
> > though.
> > >
> > > -Jay
> > >
> > >
> > > On Sun, Aug 25, 2013 at 10:00 PM, Ian Friedman <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) (mailto:
> > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]))> wrote:
> > >
> > > > Jay - is there any way to control the size of the interleaved chunks?
> > The
> > > > performance hit would likely be negligible for us at the moment.
> > > >
> > > > --
> > > > Ian Friedman
> > > >
> > > >
> > > > On Sunday, August 25, 2013 at 3:11 PM, Jay Kreps wrote:
> > > >
> > > > > I'm still a little confused by your description of the problem. It
> > might
> > > > be
> > > > > easier to understand if you listed out the exact things you have
> > > >
> > > >
> > > > measured,
> > > > > what you saw, and what you expected to see.
> > > > >
> > > > > Since you mentioned the consumer I can give a little info on how that
> > > > > works. The consumer consumes from all the partitions it owns
> > > > > simultaneously. The behavior is that we interleve fetched data
> > > > >
> > > >
> > > >
> > >
> >
> > chunks of
> > > > > messages from each partition the consumer is processing. The chunk
> > > >
> > >
> >
> > size
> > > > >
> > > >
> > > >
> > > > is
> > > > > controlled by the fetch size set in the consumer. So the behavior you
> > > >
> > > >
> > > > would
> > > > > expect is that you would get a bunch of messages from one partition
> > > > > followed by a bunch from another partition. The reason for doing this
> > > > > instead of, say, interleving individual messages is that it is a big
> > > > > performance boost--making every message an entry in a blocking queue
> > > > >
> > > >
> > > >
> > > > gives
> > > > > a 5x performance hit in high-throughput cases. Perhaps this
> > > >
> > > >
> > >
> >
> > interleaving
> > > >
> > > > is
> > > > > the problem?
> > > > >
> > > > > -Jay
> > > > >
> > > > >
> > > > > On Sun, Aug 25, 2013 at 10:22 AM, Ian Friedman <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])(mailto:
> > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])) (mailto:
> > > > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]))> wrote:
> > > > >
> > > > > > Sorry I reread what I've written so far and found that it doesn't
> > state
> > > > > > the actual problem very well. Let me clarify once again:
> > > > > >
> > > > > > The problem we're trying to solve is that we can't let messages go
> > for
> > > > > > unbounded amounts of time without getting processed, and it seems
> > > > >
> > > >
> > >
> >
> > that
> > > > > > something about what we're doing (which I suspect is the fact that
> > > > > > consumers own several partitions but only consume from one of them
> > > > > >
> > > > >
> > > >
> > >
> >
> > at a
> > > > > > time until it's caught up) is causing a small number of them to sit
> > > > >
> > > >
> > > >
> > > > around
> > > > > > for hours and hours. This is despite some consumers idling due to