Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
I filed this, to address the need for allowing parallelism when consuming
multiple single-partition topics selected with a topic filter:
https://issues.apache.org/jira/browse/KAFKA-1072
On Thu, Oct 3, 2013 at 10:56 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Ah,
>
> So this is exposed directly in the simple consumer (but not the high-level
> one?).
>
> Jason
>
>
> On Thu, Oct 3, 2013 at 10:25 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
>> See
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Wed, Oct 2, 2013 at 9:42 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
>>
>> > Jun,
>> >
>> > Thanks, can you point me to the client code to issue a metadata request!
>> >
>> > Jason
>> >
>> >
>> > On Thu, Oct 3, 2013 at 12:24 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>> >
>> > > It's fixable. Since we plan to rewrite the consumer client code in the
>> > near
>> > > future, it could be considered at that point.
>> > >
>> > > If you issue a metadata request with an empty topic list, you will get
>> > back
>> > > the metadata of all topics.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > >
>> > > On Wed, Oct 2, 2013 at 1:28 PM, Jason Rosenberg <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > > How hard would it be to fix this issue, where we have a topic filter
>> > that
>> > > > matches multiple topics, for the load to be distributed over
>> multiple
>> > > > threads, and over multiple consumers?  For some reason, I had
>> thought
>> > > this
>> > > > issue was fixed in 0.8, but I guess not?
>> > > >
>> > > > I am currently using a single partition, for multiple topics.  I
>> worry
>> > > that
>> > > > it won't scale ultimately to only ever have one thread on one
>> consumer
>> > > > doing all the work......We could move to multiple partitions, but
>> for
>> > > > ordering reasons in some use cases, this is not always ideal.
>> > > >
>> > > > Perhaps I can come up with some sort of dynamic topic sniffer, and
>> have
>> > > it
>> > > > evenly divide the available topics between the available consumers
>> (and
>> > > > threads per consumer)!  Is there a simple api within the kafka
>> client
>> > > code,
>> > > > for getting the list of topics?
>> > > >
>> > > > Jason
>> > > >
>> > > >
>> > > > On Fri, Aug 30, 2013 at 11:41 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>> > > >
>> > > > > It seems to me option 1) is easer. Option 2) has the same issue as
>> > > option
>> > > > > 1) since you have to manage different while lists.
>> > > > >
>> > > > > A more general solution is probably to change the consumer
>> > distribution
>> > > > > model to divide partitions across topics. That way, one can
>> create as
>> > > > many
>> > > > > streams as total # partitions for all topics. We can look into
>> that
>> > in
>> > > > the
>> > > > > future.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > >
>> > > > > On Fri, Aug 30, 2013 at 8:24 AM, Rajasekar Elango <
>> > > > [EMAIL PROTECTED]
>> > > > > >wrote:
>> > > > >
>> > > > > > Yeah. The actual bottleneck is actually number of topics that
>> match
>> > > the
>> > > > > > topic filter. Num of streams is going be shared between all
>> topics
>> > > it's
>> > > > > > consuming from. I thought about following ideas to work around
>> > this.
>> > > (I
>> > > > > am
>> > > > > > basically referring to mirrormaker consumer in examples).
>> > > > > >
>> > > > > > Option 1). Instead of running one mirrormaker process with topic
>> > > filter
>> > > > > > ".+", We can start multiple mirrormaker process with topic
>> filter
>> > > > > matching
>> > > > > > each topic (Eg: mirrormaker1 => whitelist topic1.* ,
>> mirrormaker2
>> > > > > > => whitelist topic2.* etc)
>> > > > > >
>> > > > > > But this adds some operations overhead to start and manage
>> multiple
>> > > > > > processes on the host.
>> > > > > >
>> > > > > > Option 2) Modify mirrormaker code to support list of whitelist
>> > > filters
>> > > > > and
>> > >
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB