Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
Jason Rosenberg 2013-10-03, 15:12
I filed this, to address the need for allowing parallelism when consuming
multiple single-partition topics selected with a topic filter:
https://issues.apache.org/jira/browse/KAFKA-1072
On Thu, Oct 3, 2013 at 10:56 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Ah,
>
> So this is exposed directly in the simple consumer (but not the high-level
> one?).
>
> Jason
>
>
> On Thu, Oct 3, 2013 at 10:25 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
>> See
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Wed, Oct 2, 2013 at 9:42 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
>>
>> > Jun,
>> >
>> > Thanks, can you point me to the client code to issue a metadata request!
>> >
>> > Jason
>> >
>> >
>> > On Thu, Oct 3, 2013 at 12:24 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>> >
>> > > It's fixable. Since we plan to rewrite the consumer client code in the
>> > near
>> > > future, it could be considered at that point.
>> > >
>> > > If you issue a metadata request with an empty topic list, you will get
>> > back
>> > > the metadata of all topics.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > >
>> > > On Wed, Oct 2, 2013 at 1:28 PM, Jason Rosenberg <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > > How hard would it be to fix this issue, where we have a topic filter
>> > that
>> > > > matches multiple topics, for the load to be distributed over
>> multiple
>> > > > threads, and over multiple consumers?  For some reason, I had
>> thought
>> > > this
>> > > > issue was fixed in 0.8, but I guess not?
>> > > >
>> > > > I am currently using a single partition, for multiple topics.  I
>> worry
>> > > that
>> > > > it won't scale ultimately to only ever have one thread on one
>> consumer
>> > > > doing all the work......We could move to multiple partitions, but
>> for
>> > > > ordering reasons in some use cases, this is not always ideal.
>> > > >
>> > > > Perhaps I can come up with some sort of dynamic topic sniffer, and
>> have
>> > > it
>> > > > evenly divide the available topics between the available consumers
>> (and
>> > > > threads per consumer)!  Is there a simple api within the kafka
>> client
>> > > code,
>> > > > for getting the list of topics?
>> > > >
>> > > > Jason
>> > > >
>> > > >
>> > > > On Fri, Aug 30, 2013 at 11:41 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>> > > >
>> > > > > It seems to me option 1) is easer. Option 2) has the same issue as
>> > > option
>> > > > > 1) since you have to manage different while lists.
>> > > > >
>> > > > > A more general solution is probably to change the consumer
>> > distribution
>> > > > > model to divide partitions across topics. That way, one can
>> create as
>> > > > many
>> > > > > streams as total # partitions for all topics. We can look into
>> that
>> > in
>> > > > the
>> > > > > future.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > >
>> > > > > On Fri, Aug 30, 2013 at 8:24 AM, Rajasekar Elango <
>> > > > [EMAIL PROTECTED]
>> > > > > >wrote:
>> > > > >
>> > > > > > Yeah. The actual bottleneck is actually number of topics that
>> match
>> > > the
>> > > > > > topic filter. Num of streams is going be shared between all
>> topics
>> > > it's
>> > > > > > consuming from. I thought about following ideas to work around
>> > this.
>> > > (I
>> > > > > am
>> > > > > > basically referring to mirrormaker consumer in examples).
>> > > > > >
>> > > > > > Option 1). Instead of running one mirrormaker process with topic
>> > > filter
>> > > > > > ".+", We can start multiple mirrormaker process with topic
>> filter
>> > > > > matching
>> > > > > > each topic (Eg: mirrormaker1 => whitelist topic1.* ,
>> mirrormaker2
>> > > > > > => whitelist topic2.* etc)
>> > > > > >
>> > > > > > But this adds some operations overhead to start and manage
>> multiple
>> > > > > > processes on the host.
>> > > > > >
>> > > > > > Option 2) Modify mirrormaker code to support list of whitelist
>> > > filters
>> > > > > and
>> > >