Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
Jun Rao 2013-10-03, 04:24
It's fixable. Since we plan to rewrite the consumer client code in the near
future, it could be considered at that point.

If you issue a metadata request with an empty topic list, you will get back
the metadata of all topics.

Thanks,

Jun
On Wed, Oct 2, 2013 at 1:28 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> How hard would it be to fix this issue, where we have a topic filter that
> matches multiple topics, for the load to be distributed over multiple
> threads, and over multiple consumers?  For some reason, I had thought this
> issue was fixed in 0.8, but I guess not?
>
> I am currently using a single partition, for multiple topics.  I worry that
> it won't scale ultimately to only ever have one thread on one consumer
> doing all the work......We could move to multiple partitions, but for
> ordering reasons in some use cases, this is not always ideal.
>
> Perhaps I can come up with some sort of dynamic topic sniffer, and have it
> evenly divide the available topics between the available consumers (and
> threads per consumer)!  Is there a simple api within the kafka client code,
> for getting the list of topics?
>
> Jason
>
>
> On Fri, Aug 30, 2013 at 11:41 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > It seems to me option 1) is easer. Option 2) has the same issue as option
> > 1) since you have to manage different while lists.
> >
> > A more general solution is probably to change the consumer distribution
> > model to divide partitions across topics. That way, one can create as
> many
> > streams as total # partitions for all topics. We can look into that in
> the
> > future.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Aug 30, 2013 at 8:24 AM, Rajasekar Elango <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Yeah. The actual bottleneck is actually number of topics that match the
> > > topic filter. Num of streams is going be shared between all topics it's
> > > consuming from. I thought about following ideas to work around this. (I
> > am
> > > basically referring to mirrormaker consumer in examples).
> > >
> > > Option 1). Instead of running one mirrormaker process with topic filter
> > > ".+", We can start multiple mirrormaker process with topic filter
> > matching
> > > each topic (Eg: mirrormaker1 => whitelist topic1.* , mirrormaker2
> > > => whitelist topic2.* etc)
> > >
> > > But this adds some operations overhead to start and manage multiple
> > > processes on the host.
> > >
> > > Option 2) Modify mirrormaker code to support list of whitelist filters
> > and
> > > it should create message streams for  each filter
> > > (call createMessageStreamsByFilter for each filter).
> > >
> > > What would be your recommendation..? If adding feature to mirrormaker
> is
> > > worth kafka, we can do option 2.
> > >
> > > Thanks,
> > > Raja.
> > >
> > >
> > >
> > >
> > > On Fri, Aug 30, 2013 at 10:34 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > Right, but if you set #partitions in each topic to 16, you can use a
> > > total
> > > > of 16 streams.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango <
> > > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > With option 1) I can't really use 8 streams in each consumer, If I
> do
> > > > only
> > > > > one consumer seem to be doing all work. So I had to actually use
> > total
> > > 8
> > > > > streams with 4 for each consumer.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 30, 2013 at 12:01 AM, Jun Rao <[EMAIL PROTECTED]>
> wrote:
> > > > >
> > > > > > The drawback of 2), as you said is no auto failover. I was
> > suggesting
> > > > > that
> > > > > > you use 16 partitions. Then you can use option 1) with 8 streams
> in
> > > > each
> > > > > > consumer.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 29, 2013 at 8:51 PM, Rajasekar Elango <
> > > > > [EMAIL PROTECTED]
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > >