Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Num of streams for consumers using TopicFilter.


Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
Jason Rosenberg 2013-10-03, 14:57
Ah,

So this is exposed directly in the simple consumer (but not the high-level
one?).

Jason
On Thu, Oct 3, 2013 at 10:25 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> See
>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
>
> Thanks,
>
> Jun
>
>
> On Wed, Oct 2, 2013 at 9:42 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
>
> > Jun,
> >
> > Thanks, can you point me to the client code to issue a metadata request!
> >
> > Jason
> >
> >
> > On Thu, Oct 3, 2013 at 12:24 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> > > It's fixable. Since we plan to rewrite the consumer client code in the
> > near
> > > future, it could be considered at that point.
> > >
> > > If you issue a metadata request with an empty topic list, you will get
> > back
> > > the metadata of all topics.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Oct 2, 2013 at 1:28 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > How hard would it be to fix this issue, where we have a topic filter
> > that
> > > > matches multiple topics, for the load to be distributed over multiple
> > > > threads, and over multiple consumers?  For some reason, I had thought
> > > this
> > > > issue was fixed in 0.8, but I guess not?
> > > >
> > > > I am currently using a single partition, for multiple topics.  I
> worry
> > > that
> > > > it won't scale ultimately to only ever have one thread on one
> consumer
> > > > doing all the work......We could move to multiple partitions, but for
> > > > ordering reasons in some use cases, this is not always ideal.
> > > >
> > > > Perhaps I can come up with some sort of dynamic topic sniffer, and
> have
> > > it
> > > > evenly divide the available topics between the available consumers
> (and
> > > > threads per consumer)!  Is there a simple api within the kafka client
> > > code,
> > > > for getting the list of topics?
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Fri, Aug 30, 2013 at 11:41 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > It seems to me option 1) is easer. Option 2) has the same issue as
> > > option
> > > > > 1) since you have to manage different while lists.
> > > > >
> > > > > A more general solution is probably to change the consumer
> > distribution
> > > > > model to divide partitions across topics. That way, one can create
> as
> > > > many
> > > > > streams as total # partitions for all topics. We can look into that
> > in
> > > > the
> > > > > future.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Fri, Aug 30, 2013 at 8:24 AM, Rajasekar Elango <
> > > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > Yeah. The actual bottleneck is actually number of topics that
> match
> > > the
> > > > > > topic filter. Num of streams is going be shared between all
> topics
> > > it's
> > > > > > consuming from. I thought about following ideas to work around
> > this.
> > > (I
> > > > > am
> > > > > > basically referring to mirrormaker consumer in examples).
> > > > > >
> > > > > > Option 1). Instead of running one mirrormaker process with topic
> > > filter
> > > > > > ".+", We can start multiple mirrormaker process with topic filter
> > > > > matching
> > > > > > each topic (Eg: mirrormaker1 => whitelist topic1.* , mirrormaker2
> > > > > > => whitelist topic2.* etc)
> > > > > >
> > > > > > But this adds some operations overhead to start and manage
> multiple
> > > > > > processes on the host.
> > > > > >
> > > > > > Option 2) Modify mirrormaker code to support list of whitelist
> > > filters
> > > > > and
> > > > > > it should create message streams for  each filter
> > > > > > (call createMessageStreamsByFilter for each filter).
> > > > > >
> > > > > > What would be your recommendation..? If adding feature to
> > mirrormaker
> > > > is
> > > > > > worth kafka, we can do option 2.
> > > > > >
> > > > > > Thanks,
> > > > > > Raja.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 30, 2013 at 10:34 AM, Jun Rao <[EMAIL PROTECTED]>