Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
Jun Rao 2013-08-31, 03:41
It seems to me option 1) is easer. Option 2) has the same issue as option
1) since you have to manage different while lists.

A more general solution is probably to change the consumer distribution
model to divide partitions across topics. That way, one can create as many
streams as total # partitions for all topics. We can look into that in the
future.

Thanks,

Jun
On Fri, Aug 30, 2013 at 8:24 AM, Rajasekar Elango <[EMAIL PROTECTED]>wrote:

> Yeah. The actual bottleneck is actually number of topics that match the
> topic filter. Num of streams is going be shared between all topics it's
> consuming from. I thought about following ideas to work around this. (I am
> basically referring to mirrormaker consumer in examples).
>
> Option 1). Instead of running one mirrormaker process with topic filter
> ".+", We can start multiple mirrormaker process with topic filter matching
> each topic (Eg: mirrormaker1 => whitelist topic1.* , mirrormaker2
> => whitelist topic2.* etc)
>
> But this adds some operations overhead to start and manage multiple
> processes on the host.
>
> Option 2) Modify mirrormaker code to support list of whitelist filters and
> it should create message streams for  each filter
> (call createMessageStreamsByFilter for each filter).
>
> What would be your recommendation..? If adding feature to mirrormaker is
> worth kafka, we can do option 2.
>
> Thanks,
> Raja.
>
>
>
>
> On Fri, Aug 30, 2013 at 10:34 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Right, but if you set #partitions in each topic to 16, you can use a
> total
> > of 16 streams.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > With option 1) I can't really use 8 streams in each consumer, If I do
> > only
> > > one consumer seem to be doing all work. So I had to actually use total
> 8
> > > streams with 4 for each consumer.
> > >
> > >
> > >
> > > On Fri, Aug 30, 2013 at 12:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > The drawback of 2), as you said is no auto failover. I was suggesting
> > > that
> > > > you use 16 partitions. Then you can use option 1) with 8 streams in
> > each
> > > > consumer.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Aug 29, 2013 at 8:51 PM, Rajasekar Elango <
> > > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > If you read my previous posts, based on current re balancing logic,
> > if
> > > we
> > > > > consumer from topic filter, consumer actively use all streams. Can
> > you
> > > > > provide your recommendation of option 1 vs option 2 in my previous
> > > post?
> > > > >
> > > > > Thanks,
> > > > > Raja.
> > > > >
> > > > >
> > > > > On Thu, Aug 29, 2013 at 11:42 PM, Jun Rao <[EMAIL PROTECTED]>
> wrote:
> > > > >
> > > > > > You can always use more partitions to get more parallelism in the
> > > > > > consumers.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 29, 2013 at 12:44 PM, Rajasekar Elango
> > > > > > <[EMAIL PROTECTED]>wrote:
> > > > > >
> > > > > > > So what is best way to load balance multiple consumers
> consuming
> > > from
> > > > > > topic
> > > > > > > filter.
> > > > > > >
> > > > > > > Let's say we have 4 topics with 8 partitions and 2 consumers.
> > > > > > >
> > > > > > > Option 1) To load balance consumers, we can set num.streams=4
> so
> > > that
> > > > > > both
> > > > > > > consumers split 8 partitions. but can only use half of consumer
> > > > > streams.
> > > > > > >
> > > > > > > Option 2) Configure mutually exclusive topic filter regex such
> > > that 2
> > > > > > > topics will match consumer1 and 2 topics will match consumer2.
> > Now
> > > we
> > > > > can
> > > > > > > set num.streams=8 and fully utilize consumer streams. I believe
> > > this
> > > > > will
> > > > > > > improve performance, but if consumer dies, we will not get any
> > data
> > > > > from
>