Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Num of streams for consumers using TopicFilter.


+
Rajasekar Elango 2013-08-29, 15:03
+
Neha Narkhede 2013-08-29, 16:42
+
Rajasekar Elango 2013-08-29, 19:44
+
Jun Rao 2013-08-30, 03:42
+
Rajasekar Elango 2013-08-30, 03:52
+
Jun Rao 2013-08-30, 04:02
+
Rajasekar Elango 2013-08-30, 04:09
+
Jun Rao 2013-08-30, 14:34
+
Rajasekar Elango 2013-08-30, 15:24
+
Jun Rao 2013-08-31, 03:41
+
Jason Rosenberg 2013-10-02, 20:29
+
Jun Rao 2013-10-03, 04:24
Copy link to this message
-
Re: Num of streams for consumers using TopicFilter.
Jason Rosenberg 2013-10-03, 04:43
Jun,

Thanks, can you point me to the client code to issue a metadata request!

Jason
On Thu, Oct 3, 2013 at 12:24 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> It's fixable. Since we plan to rewrite the consumer client code in the near
> future, it could be considered at that point.
>
> If you issue a metadata request with an empty topic list, you will get back
> the metadata of all topics.
>
> Thanks,
>
> Jun
>
>
> On Wed, Oct 2, 2013 at 1:28 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
>
> > How hard would it be to fix this issue, where we have a topic filter that
> > matches multiple topics, for the load to be distributed over multiple
> > threads, and over multiple consumers?  For some reason, I had thought
> this
> > issue was fixed in 0.8, but I guess not?
> >
> > I am currently using a single partition, for multiple topics.  I worry
> that
> > it won't scale ultimately to only ever have one thread on one consumer
> > doing all the work......We could move to multiple partitions, but for
> > ordering reasons in some use cases, this is not always ideal.
> >
> > Perhaps I can come up with some sort of dynamic topic sniffer, and have
> it
> > evenly divide the available topics between the available consumers (and
> > threads per consumer)!  Is there a simple api within the kafka client
> code,
> > for getting the list of topics?
> >
> > Jason
> >
> >
> > On Fri, Aug 30, 2013 at 11:41 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> > > It seems to me option 1) is easer. Option 2) has the same issue as
> option
> > > 1) since you have to manage different while lists.
> > >
> > > A more general solution is probably to change the consumer distribution
> > > model to divide partitions across topics. That way, one can create as
> > many
> > > streams as total # partitions for all topics. We can look into that in
> > the
> > > future.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Fri, Aug 30, 2013 at 8:24 AM, Rajasekar Elango <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Yeah. The actual bottleneck is actually number of topics that match
> the
> > > > topic filter. Num of streams is going be shared between all topics
> it's
> > > > consuming from. I thought about following ideas to work around this.
> (I
> > > am
> > > > basically referring to mirrormaker consumer in examples).
> > > >
> > > > Option 1). Instead of running one mirrormaker process with topic
> filter
> > > > ".+", We can start multiple mirrormaker process with topic filter
> > > matching
> > > > each topic (Eg: mirrormaker1 => whitelist topic1.* , mirrormaker2
> > > > => whitelist topic2.* etc)
> > > >
> > > > But this adds some operations overhead to start and manage multiple
> > > > processes on the host.
> > > >
> > > > Option 2) Modify mirrormaker code to support list of whitelist
> filters
> > > and
> > > > it should create message streams for  each filter
> > > > (call createMessageStreamsByFilter for each filter).
> > > >
> > > > What would be your recommendation..? If adding feature to mirrormaker
> > is
> > > > worth kafka, we can do option 2.
> > > >
> > > > Thanks,
> > > > Raja.
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Aug 30, 2013 at 10:34 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Right, but if you set #partitions in each topic to 16, you can use
> a
> > > > total
> > > > > of 16 streams.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango <
> > > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > With option 1) I can't really use 8 streams in each consumer, If
> I
> > do
> > > > > only
> > > > > > one consumer seem to be doing all work. So I had to actually use
> > > total
> > > > 8
> > > > > > streams with 4 for each consumer.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 30, 2013 at 12:01 AM, Jun Rao <[EMAIL PROTECTED]>
> > wrote:
> > > > > >
> > > > > > > The drawback of 2), as you said is no auto failover. I was

 
+
Jun Rao 2013-10-03, 14:25
+
Jason Rosenberg 2013-10-03, 14:57
+
Jason Rosenberg 2013-10-03, 15:12