Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
Right, but if you set #partitions in each topic to 16, you can use a total
of 16 streams.

Thanks,

Jun
On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango <[EMAIL PROTECTED]>wrote:

> With option 1) I can't really use 8 streams in each consumer, If I do only
> one consumer seem to be doing all work. So I had to actually use total 8
> streams with 4 for each consumer.
>
>
>
> On Fri, Aug 30, 2013 at 12:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > The drawback of 2), as you said is no auto failover. I was suggesting
> that
> > you use 16 partitions. Then you can use option 1) with 8 streams in each
> > consumer.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Aug 29, 2013 at 8:51 PM, Rajasekar Elango <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi Jun,
> > >
> > > If you read my previous posts, based on current re balancing logic, if
> we
> > > consumer from topic filter, consumer actively use all streams. Can you
> > > provide your recommendation of option 1 vs option 2 in my previous
> post?
> > >
> > > Thanks,
> > > Raja.
> > >
> > >
> > > On Thu, Aug 29, 2013 at 11:42 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > You can always use more partitions to get more parallelism in the
> > > > consumers.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Aug 29, 2013 at 12:44 PM, Rajasekar Elango
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > > > So what is best way to load balance multiple consumers consuming
> from
> > > > topic
> > > > > filter.
> > > > >
> > > > > Let's say we have 4 topics with 8 partitions and 2 consumers.
> > > > >
> > > > > Option 1) To load balance consumers, we can set num.streams=4 so
> that
> > > > both
> > > > > consumers split 8 partitions. but can only use half of consumer
> > > streams.
> > > > >
> > > > > Option 2) Configure mutually exclusive topic filter regex such
> that 2
> > > > > topics will match consumer1 and 2 topics will match consumer2. Now
> we
> > > can
> > > > > set num.streams=8 and fully utilize consumer streams. I believe
> this
> > > will
> > > > > improve performance, but if consumer dies, we will not get any data
> > > from
> > > > > the topic used by that consumer.
> > > > >
> > > > > What would be your recommendation?
> > > > >
> > > > > Thanks,
> > > > > Raja.
> > > > >
> > > > >
> > > > > On Thu, Aug 29, 2013 at 12:42 PM, Neha Narkhede <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > >> 2) When I started mirrormaker with num.streams=16, looks like
> 16
> > > > > > consumer
> > > > > > threads were created, but only 8 are showing up as active as
> owner
> > in
> > > > > > consumer offset tracker and all topics/partitions are distributed
> > > > > between 8
> > > > > > consumer threads.
> > > > > >
> > > > > > This is because currently the consumer rebalancing process of
> > > assigning
> > > > > > partitions to consumer streams is at a per topic level. Unless
> you
> > > have
> > > > > at
> > > > > > least one topic with 16 partitions, the remaining 8 threads will
> > not
> > > do
> > > > > any
> > > > > > work. This is not ideal and we want to look into a better
> > rebalancing
> > > > > > algorithm. Though it is a big change and we prefer doing it as
> part
> > > of
> > > > > the
> > > > > > consumer client rewrite.
> > > > > >
> > > > > > Thanks,
> > > > > > Neha
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 29, 2013 at 8:03 AM, Rajasekar Elango <
> > > > > [EMAIL PROTECTED]
> > > > > > >wrote:
> > > > > >
> > > > > > > So my understanding is num of active streams that a consumer
> can
> > > > > utilize
> > > > > > is
> > > > > > > number of partitions in topic. This is fine if we consumer from
> > > > > specific
> > > > > > > topic. But if we consumer from TopicFilter, I thought consumer
> > > should
> > > > > > able
> > > > > > > to utilize (number of topics that match filter * number of
> > > partitions
> > > > > in
> > > > > > > topic) . But looks like number of streams that consumer can use