Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Consumer throughput imbalance


Copy link to this message
-
Re: Consumer throughput imbalance
Ian Friedman 2013-08-25, 16:01
What if you don't know ahead of time how long a message will take to consume?

--
Ian Friedman
On Sunday, August 25, 2013 at 10:45 AM, Neha Narkhede wrote:

> Making producer side partitioning depend on consumer behavior might not be
> such a good idea. If consumption is a bottleneck, changing producer side
> partitioning may not help. To relieve consumption bottleneck, you may need
> to increase the number of partitions for those topics and increase the
> number of consumer instances.
>
> You mentioned that the consumers take longer to process certain kinds of
> messages. What you can do is place the messages that require slower
> processing in separate topics, so that you can scale the number of
> partitions and number of consumer instances, for those messages
> independently.
>
> Thanks,
> Neha
>
>
> On Sat, Aug 24, 2013 at 9:57 AM, Ian Friedman <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
>
> > Hey guys! We recently deployed our kafka data pipeline application over
> > the weekend and it is working out quite well once we ironed out all the
> > issues. There is one behavior that we've noticed that is mildly troubling,
> > though not a deal breaker. We're using a single topic with many partitions
> > (1200 total) to load balance our 300 consumers, but what seems to happen is
> > that some partitions end up more backed up than others. This is probably
> > due more to the specifics of the application since some messages take much
> > longer than others to process.
> >
> > I'm thinking that the random partitioning in the producer is unsuited to
> > our specific needs. One option I was considering was to write an alternate
> > partitioner that looks at the consumer offsets from zookeeper (as in the
> > ConsumerOffsetChecker) and probabilistically weights the partitions by
> > their lag. Does this sound like a good idea to anyone else? Is there a
> > better or preferably already built solution? If anyone has any ideas or
> > feedback I'd sincerely appreciate it.
> >
> > Thanks so much in advance.
> >
> > P.S. thanks especially to everyone who's answered my dumb questions on
> > this mailing list over the past few months, we couldn't have done it
> > without you!
> >
> > --
> > Ian Friedman
> >
>
>
>