Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Using Kafka for "data" messages

Copy link to this message
Re: Using Kafka for "data" messages
Ah yes, I had read that Kafka likes under 1,000 topics but I wasn't sure if that was really a limitation.  In principle I wouldn't mind having all guest events placed on the "GUEST_DATA" queue but I thought that by having more topics I could minimize having consumers read messages only to discard them.  My thought had been that if I have 20 Web JVM and at any given time I have 1,000 people logged in per JVM, each JVM would only need to consume the messages from 1,000 topics.  If instead there is a single topic, each JVM will be consuming from the same topic (and be in different consumer groups) but 19 out of 20 messages will be for guests that are not even logged into that JVM.  Since Kafka doesn't have message selectors or anything like that I was hoping to use topics to help segregate the traffic.  I don't want to use 1 topic per Web JVM because in the future other consumers may be interested in that same data and the services that put the data in
 Kafka shouldn't have to lookup what JVM that user is logged into (or get that from another message and keep track of it).  Any thoughts on how to work around this?  I know there are topic partitions but that seems more like a way to distribute the workload in terms of storing the messages and not for the message selection scenario I am describing if I understood correctly.
 From: Timothy Chen <[EMAIL PROTECTED]>
Sent: Thursday, June 13, 2013 2:13 PM
Subject: Re: Using Kafka for "data" messages

Also since you're going to be creating a topic per user, the number of
concurrent users will also be a concern to Kafka as it doesn't like massive
amounts of topics.

On Thu, Jun 13, 2013 at 10:47 AM, Josh Foure <[EMAIL PROTECTED]> wrote:

> Hi Mahendra, I think that is where it gets a little tricky.  I think it
> would work something like this:
> 1.  Web sends login event for user "user123" to topic "GUEST_EVENT".
> 2.  All of the systems consume those messages and publish the data
> messages to topic "GUEST_DATA.user123".
> 3.  The Recommendation system gets all of the data from
> "GUEST_DATA.user123", processes and then publishes back to the same topic
> "GUEST_DATA.user123".
> 4.  The Web consumes the messages from the same topic (there is a
> different topic for every user that logged in) "GUEST_DATA.user123" and
> when it finds the recommendation messages it pushes that to the browser
> (note it will need to read all the other data messages and discard those
> when looking for the recommendation messages).  I have a concern that the
> Web will be flooded with a ton of messages that it will promptly drop but I
> don't want to create a new "response" or "recommendation" topic because
> then I feel like I am tightly coupling the message to the functionality and
> in the future different systems may want to consume those messages as well.
> Does that make sense?
> Josh
> ________________________________
>  From: Mahendra M <[EMAIL PROTECTED]>
> Sent: Thursday, June 13, 2013 12:56 PM
> Subject: Re: Using Kafka for "data" messages
> Hi Josh,
> The idea looks very interesting. I just had one doubt.
> 1. A user logs in. His login id is sent on a topic
> 2. Other systems (consumers on this topic) consumer this message and
> publish their results to another topic
> This will be happening without any particular order for hundreds of users.
> Now the site being displayed to the user.. How will you fetch only messages
> for that user from the queue?
> Regards,
> Mahendra
> On Thu, Jun 13, 2013 at 8:51 PM, Josh Foure <[EMAIL PROTECTED]> wrote:
> >
> > Hi all, my team is proposing a novel
> > way of using Kafka and I am hoping someone can help do a sanity check on
> > this:
> >
> > 1.  When a user logs
> > into our website, we will create a “logged in” event message in Kafka
> > containing the user id.