Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
You can always use more partitions to get more parallelism in the consumers.

Thanks,

Jun
On Thu, Aug 29, 2013 at 12:44 PM, Rajasekar Elango
<[EMAIL PROTECTED]>wrote:

> So what is best way to load balance multiple consumers consuming from topic
> filter.
>
> Let's say we have 4 topics with 8 partitions and 2 consumers.
>
> Option 1) To load balance consumers, we can set num.streams=4 so that both
> consumers split 8 partitions. but can only use half of consumer streams.
>
> Option 2) Configure mutually exclusive topic filter regex such that 2
> topics will match consumer1 and 2 topics will match consumer2. Now we can
> set num.streams=8 and fully utilize consumer streams. I believe this will
> improve performance, but if consumer dies, we will not get any data from
> the topic used by that consumer.
>
> What would be your recommendation?
>
> Thanks,
> Raja.
>
>
> On Thu, Aug 29, 2013 at 12:42 PM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > >> 2) When I started mirrormaker with num.streams=16, looks like 16
> > consumer
> > threads were created, but only 8 are showing up as active as owner in
> > consumer offset tracker and all topics/partitions are distributed
> between 8
> > consumer threads.
> >
> > This is because currently the consumer rebalancing process of assigning
> > partitions to consumer streams is at a per topic level. Unless you have
> at
> > least one topic with 16 partitions, the remaining 8 threads will not do
> any
> > work. This is not ideal and we want to look into a better rebalancing
> > algorithm. Though it is a big change and we prefer doing it as part of
> the
> > consumer client rewrite.
> >
> > Thanks,
> > Neha
> >
> >
> > On Thu, Aug 29, 2013 at 8:03 AM, Rajasekar Elango <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > So my understanding is num of active streams that a consumer can
> utilize
> > is
> > > number of partitions in topic. This is fine if we consumer from
> specific
> > > topic. But if we consumer from TopicFilter, I thought consumer should
> > able
> > > to utilize (number of topics that match filter * number of partitions
> in
> > > topic) . But looks like number of streams that consumer can use is
> > limited
> > > by just number if partitions in topic although it's consuming from
> > multiple
> > > topic.
> > >
> > > Here what I observed with 1 mirrormaker consuming from whitelist '.+'.
> > >
> > > The white list matches 5 topics and each topic has 8 partitions. I used
> > > consumer offset checker to look at owner of each/topic partition.
> > >
> > > 1) When I started mirrormaker with num.streams=8, all topics/partitions
> > are
> > > distributed between 8 consumer threads.
> > >
> > > 2) When I started mirrormaker with num.streams=16, looks like 16
> consumer
> > > threads were created, but only 8 are showing up as active as owner in
> > > consumer offset tracker and all topics/partitions are distributed
> > between 8
> > > consumer threads.
> > >
> > > So this could be bottleneck for consumers as although we partitioned
> > topic,
> > > if we are consuming from topic filter it can't utilize much of
> > parallelism
> > > with num of streams. Am i missing something, is there a way to make
> > > cosumers/mirrormakers to utilize more number of active streams?
> > >
> > >
> > > --
> > > Thanks,
> > > Raja.
> > >
> >
>
>
>
> --
> Thanks,
> Raja.
>