Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
Right, but if you set #partitions in each topic to 16, you can use a total
of 16 streams.

Thanks,

Jun
On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango <[EMAIL PROTECTED]>wrote:

> With option 1) I can't really use 8 streams in each consumer, If I do only
> one consumer seem to be doing all work. So I had to actually use total 8
> streams with 4 for each consumer.
>
>
>
> On Fri, Aug 30, 2013 at 12:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > The drawback of 2), as you said is no auto failover. I was suggesting
> that
> > you use 16 partitions. Then you can use option 1) with 8 streams in each
> > consumer.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Aug 29, 2013 at 8:51 PM, Rajasekar Elango <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi Jun,
> > >
> > > If you read my previous posts, based on current re balancing logic, if
> we
> > > consumer from topic filter, consumer actively use all streams. Can you
> > > provide your recommendation of option 1 vs option 2 in my previous
> post?
> > >
> > > Thanks,
> > > Raja.
> > >
> > >
> > > On Thu, Aug 29, 2013 at 11:42 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > You can always use more partitions to get more parallelism in the
> > > > consumers.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Aug 29, 2013 at 12:44 PM, Rajasekar Elango
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > > > So what is best way to load balance multiple consumers consuming
> from
> > > > topic
> > > > > filter.
> > > > >
> > > > > Let's say we have 4 topics with 8 partitions and 2 consumers.
> > > > >
> > > > > Option 1) To load balance consumers, we can set num.streams=4 so
> that
> > > > both
> > > > > consumers split 8 partitions. but can only use half of consumer
> > > streams.
> > > > >
> > > > > Option 2) Configure mutually exclusive topic filter regex such
> that 2
> > > > > topics will match consumer1 and 2 topics will match consumer2. Now
> we
> > > can
> > > > > set num.streams=8 and fully utilize consumer streams. I believe
> this
> > > will
> > > > > improve performance, but if consumer dies, we will not get any data
> > > from
> > > > > the topic used by that consumer.
> > > > >
> > > > > What would be your recommendation?
> > > > >
> > > > > Thanks,
> > > > > Raja.
> > > > >
> > > > >
> > > > > On Thu, Aug 29, 2013 at 12:42 PM, Neha Narkhede <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > >> 2) When I started mirrormaker with num.streams=16, looks like
> 16
> > > > > > consumer
> > > > > > threads were created, but only 8 are showing up as active as
> owner
> > in
> > > > > > consumer offset tracker and all topics/partitions are distributed
> > > > > between 8
> > > > > > consumer threads.
> > > > > >
> > > > > > This is because currently the consumer rebalancing process of
> > > assigning
> > > > > > partitions to consumer streams is at a per topic level. Unless
> you
> > > have
> > > > > at
> > > > > > least one topic with 16 partitions, the remaining 8 threads will
> > not
> > > do
> > > > > any
> > > > > > work. This is not ideal and we want to look into a better
> > rebalancing
> > > > > > algorithm. Though it is a big change and we prefer doing it as
> part
> > > of
> > > > > the
> > > > > > consumer client rewrite.
> > > > > >
> > > > > > Thanks,
> > > > > > Neha
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 29, 2013 at 8:03 AM, Rajasekar Elango <
> > > > > [EMAIL PROTECTED]
> > > > > > >wrote:
> > > > > >
> > > > > > > So my understanding is num of active streams that a consumer
> can
> > > > > utilize
> > > > > > is
> > > > > > > number of partitions in topic. This is fine if we consumer from
> > > > > specific
> > > > > > > topic. But if we consumer from TopicFilter, I thought consumer
> > > should
> > > > > > able
> > > > > > > to utilize (number of topics that match filter * number of
> > > partitions
> > > > > in
> > > > > > > topic) . But looks like number of streams that consumer can use

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB