Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Num of streams for consumers using TopicFilter.


+
Rajasekar Elango 2013-08-29, 15:03
+
Neha Narkhede 2013-08-29, 16:42
+
Rajasekar Elango 2013-08-29, 19:44
+
Jun Rao 2013-08-30, 03:42
Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
Hi Jun,

If you read my previous posts, based on current re balancing logic, if we
consumer from topic filter, consumer actively use all streams. Can you
provide your recommendation of option 1 vs option 2 in my previous post?

Thanks,
Raja.
On Thu, Aug 29, 2013 at 11:42 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> You can always use more partitions to get more parallelism in the
> consumers.
>
> Thanks,
>
> Jun
>
>
> On Thu, Aug 29, 2013 at 12:44 PM, Rajasekar Elango
> <[EMAIL PROTECTED]>wrote:
>
> > So what is best way to load balance multiple consumers consuming from
> topic
> > filter.
> >
> > Let's say we have 4 topics with 8 partitions and 2 consumers.
> >
> > Option 1) To load balance consumers, we can set num.streams=4 so that
> both
> > consumers split 8 partitions. but can only use half of consumer streams.
> >
> > Option 2) Configure mutually exclusive topic filter regex such that 2
> > topics will match consumer1 and 2 topics will match consumer2. Now we can
> > set num.streams=8 and fully utilize consumer streams. I believe this will
> > improve performance, but if consumer dies, we will not get any data from
> > the topic used by that consumer.
> >
> > What would be your recommendation?
> >
> > Thanks,
> > Raja.
> >
> >
> > On Thu, Aug 29, 2013 at 12:42 PM, Neha Narkhede <[EMAIL PROTECTED]
> > >wrote:
> >
> > > >> 2) When I started mirrormaker with num.streams=16, looks like 16
> > > consumer
> > > threads were created, but only 8 are showing up as active as owner in
> > > consumer offset tracker and all topics/partitions are distributed
> > between 8
> > > consumer threads.
> > >
> > > This is because currently the consumer rebalancing process of assigning
> > > partitions to consumer streams is at a per topic level. Unless you have
> > at
> > > least one topic with 16 partitions, the remaining 8 threads will not do
> > any
> > > work. This is not ideal and we want to look into a better rebalancing
> > > algorithm. Though it is a big change and we prefer doing it as part of
> > the
> > > consumer client rewrite.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Thu, Aug 29, 2013 at 8:03 AM, Rajasekar Elango <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > So my understanding is num of active streams that a consumer can
> > utilize
> > > is
> > > > number of partitions in topic. This is fine if we consumer from
> > specific
> > > > topic. But if we consumer from TopicFilter, I thought consumer should
> > > able
> > > > to utilize (number of topics that match filter * number of partitions
> > in
> > > > topic) . But looks like number of streams that consumer can use is
> > > limited
> > > > by just number if partitions in topic although it's consuming from
> > > multiple
> > > > topic.
> > > >
> > > > Here what I observed with 1 mirrormaker consuming from whitelist
> '.+'.
> > > >
> > > > The white list matches 5 topics and each topic has 8 partitions. I
> used
> > > > consumer offset checker to look at owner of each/topic partition.
> > > >
> > > > 1) When I started mirrormaker with num.streams=8, all
> topics/partitions
> > > are
> > > > distributed between 8 consumer threads.
> > > >
> > > > 2) When I started mirrormaker with num.streams=16, looks like 16
> > consumer
> > > > threads were created, but only 8 are showing up as active as owner in
> > > > consumer offset tracker and all topics/partitions are distributed
> > > between 8
> > > > consumer threads.
> > > >
> > > > So this could be bottleneck for consumers as although we partitioned
> > > topic,
> > > > if we are consuming from topic filter it can't utilize much of
> > > parallelism
> > > > with num of streams. Am i missing something, is there a way to make
> > > > cosumers/mirrormakers to utilize more number of active streams?
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Raja.
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Raja.
> >
>

--
Thanks,
Raja.

 
+
Jun Rao 2013-08-30, 04:02
+
Rajasekar Elango 2013-08-30, 04:09
+
Jun Rao 2013-08-30, 14:34
+
Rajasekar Elango 2013-08-30, 15:24
+
Jun Rao 2013-08-31, 03:41
+
Jason Rosenberg 2013-10-02, 20:29
+
Jun Rao 2013-10-03, 04:24
+
Jason Rosenberg 2013-10-03, 04:43
+
Jun Rao 2013-10-03, 14:25
+
Jason Rosenberg 2013-10-03, 14:57
+
Jason Rosenberg 2013-10-03, 15:12
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB