Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
It seems to me option 1) is easer. Option 2) has the same issue as option
1) since you have to manage different while lists.

A more general solution is probably to change the consumer distribution
model to divide partitions across topics. That way, one can create as many
streams as total # partitions for all topics. We can look into that in the
future.

Thanks,

Jun
On Fri, Aug 30, 2013 at 8:24 AM, Rajasekar Elango <[EMAIL PROTECTED]>wrote:

> Yeah. The actual bottleneck is actually number of topics that match the
> topic filter. Num of streams is going be shared between all topics it's
> consuming from. I thought about following ideas to work around this. (I am
> basically referring to mirrormaker consumer in examples).
>
> Option 1). Instead of running one mirrormaker process with topic filter
> ".+", We can start multiple mirrormaker process with topic filter matching
> each topic (Eg: mirrormaker1 => whitelist topic1.* , mirrormaker2
> => whitelist topic2.* etc)
>
> But this adds some operations overhead to start and manage multiple
> processes on the host.
>
> Option 2) Modify mirrormaker code to support list of whitelist filters and
> it should create message streams for  each filter
> (call createMessageStreamsByFilter for each filter).
>
> What would be your recommendation..? If adding feature to mirrormaker is
> worth kafka, we can do option 2.
>
> Thanks,
> Raja.
>
>
>
>
> On Fri, Aug 30, 2013 at 10:34 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Right, but if you set #partitions in each topic to 16, you can use a
> total
> > of 16 streams.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > With option 1) I can't really use 8 streams in each consumer, If I do
> > only
> > > one consumer seem to be doing all work. So I had to actually use total
> 8
> > > streams with 4 for each consumer.
> > >
> > >
> > >
> > > On Fri, Aug 30, 2013 at 12:01 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > The drawback of 2), as you said is no auto failover. I was suggesting
> > > that
> > > > you use 16 partitions. Then you can use option 1) with 8 streams in
> > each
> > > > consumer.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Aug 29, 2013 at 8:51 PM, Rajasekar Elango <
> > > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > If you read my previous posts, based on current re balancing logic,
> > if
> > > we
> > > > > consumer from topic filter, consumer actively use all streams. Can
> > you
> > > > > provide your recommendation of option 1 vs option 2 in my previous
> > > post?
> > > > >
> > > > > Thanks,
> > > > > Raja.
> > > > >
> > > > >
> > > > > On Thu, Aug 29, 2013 at 11:42 PM, Jun Rao <[EMAIL PROTECTED]>
> wrote:
> > > > >
> > > > > > You can always use more partitions to get more parallelism in the
> > > > > > consumers.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 29, 2013 at 12:44 PM, Rajasekar Elango
> > > > > > <[EMAIL PROTECTED]>wrote:
> > > > > >
> > > > > > > So what is best way to load balance multiple consumers
> consuming
> > > from
> > > > > > topic
> > > > > > > filter.
> > > > > > >
> > > > > > > Let's say we have 4 topics with 8 partitions and 2 consumers.
> > > > > > >
> > > > > > > Option 1) To load balance consumers, we can set num.streams=4
> so
> > > that
> > > > > > both
> > > > > > > consumers split 8 partitions. but can only use half of consumer
> > > > > streams.
> > > > > > >
> > > > > > > Option 2) Configure mutually exclusive topic filter regex such
> > > that 2
> > > > > > > topics will match consumer1 and 2 topics will match consumer2.
> > Now
> > > we
> > > > > can
> > > > > > > set num.streams=8 and fully utilize consumer streams. I believe
> > > this
> > > > > will
> > > > > > > improve performance, but if consumer dies, we will not get any
> > data
> > > > > from
>
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB