Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Consumer throughput imbalance


Copy link to this message
-
Re: Consumer throughput imbalance
Just to make sure i have this right, on the producer side we'd set max.message.size and then on the consumer side we'd set fetch.size? I admittedly didn't research how all the tuning options would affect us, thank you for the info. Would queuedchunks.max have any effect?

--
Ian Friedman
On Monday, August 26, 2013 at 1:26 PM, Jay Kreps wrote:

> Yeah it is always equal to the fetch size. The fetch size needs to be at
> least equal to the max message size you have allowed on the server, though.
>
> -Jay
>
>
> On Sun, Aug 25, 2013 at 10:00 PM, Ian Friedman <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
>
> > Jay - is there any way to control the size of the interleaved chunks? The
> > performance hit would likely be negligible for us at the moment.
> >
> > --
> > Ian Friedman
> >
> >
> > On Sunday, August 25, 2013 at 3:11 PM, Jay Kreps wrote:
> >
> > > I'm still a little confused by your description of the problem. It might
> > be
> > > easier to understand if you listed out the exact things you have
> >
> > measured,
> > > what you saw, and what you expected to see.
> > >
> > > Since you mentioned the consumer I can give a little info on how that
> > > works. The consumer consumes from all the partitions it owns
> > > simultaneously. The behavior is that we interleve fetched data chunks of
> > > messages from each partition the consumer is processing. The chunk size
> > >
> >
> > is
> > > controlled by the fetch size set in the consumer. So the behavior you
> >
> > would
> > > expect is that you would get a bunch of messages from one partition
> > > followed by a bunch from another partition. The reason for doing this
> > > instead of, say, interleving individual messages is that it is a big
> > > performance boost--making every message an entry in a blocking queue
> > >
> >
> > gives
> > > a 5x performance hit in high-throughput cases. Perhaps this interleaving
> >
> > is
> > > the problem?
> > >
> > > -Jay
> > >
> > >
> > > On Sun, Aug 25, 2013 at 10:22 AM, Ian Friedman <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) (mailto:
> > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]))> wrote:
> > >
> > > > Sorry I reread what I've written so far and found that it doesn't state
> > > > the actual problem very well. Let me clarify once again:
> > > >
> > > > The problem we're trying to solve is that we can't let messages go for
> > > > unbounded amounts of time without getting processed, and it seems that
> > > > something about what we're doing (which I suspect is the fact that
> > > > consumers own several partitions but only consume from one of them at a
> > > > time until it's caught up) is causing a small number of them to sit
> > > >
> > >
> > >
> >
> > around
> > > > for hours and hours. This is despite some consumers idling due to being
> > > > fully caught up on the partitions they own. We've found that
> > > >
> > >
> >
> > requeueing the
> > > > oldest messages (consumers ignore messages that have already been
> > > > processed) is fairly effective in getting them to go away, but I'm
> > > >
> > >
> >
> > looking
> > > > for a more stable solution.
> > > >
> > > > --
> > > > Ian Friedman
> > > >
> > > >
> > > > On Sunday, August 25, 2013 at 1:15 PM, Ian Friedman wrote:
> > > >
> > > > > When I said "some messages take longer than others" that may have
> > been
> > > > misleading. What I meant there is that the performance of the entire
> > > > application is inconsistent, mostly due to pressure from other
> > > >
> > >
> >
> > applications
> > > > (mapreduce) on our HBase and MySQL backends. On top of that, some
> > >
> >
> > messages
> > > > just contain more data. Now I suppose what you're suggesting is that I
> > > > segment my messages by the average or expected time it takes the
> > > >
> > >
> >
> > payloads
> > > > to process, but I suspect what will happen if I do that is I will have
> > > > several consumers doing nothing most of the time, and the rest of them
> > > > backlogged inconsistently the same way they are now. The problem isn't