Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Using Kafka for "data" messages


Copy link to this message
-
Re: Using Kafka for "data" messages
Mahendra M 2013-06-14, 12:03
Hi Josh,

Thanks for clarifying the use case. The idea is good, but I see the
following three issues

   1. Creating a queue for each user. There could be limits on this
   2. Removing old queues
   3. If the same user logs in from multiple browsers, things get a bit
   more complex.

Can I suggest an alternate approach than using Kafka?

Using a combination of Kafka and XMPP-BOSH/Comet for this.

   1. User logs in. Message is sent on a Kafka queue.
   2. Web browser starts a long polling connection to a server (XMPP-BOSH /
   Comet)
   3. Consumers pick up message in (1) and do their job. They push their
   results to a results queue and to an XMPP end-point ([EMAIL PROTECTED])
   4. Recommender can pick up from the results queue and push it's result
   to the XMPP end-point
   5. Web front-end picks up the messages and does the displaying job.

If you plan it more, you can avoid using Kafka in this use case and just do
with XMPP (for steps 1 and 3)

Also, you don't have to take care of large number of queues, removing them
etc. Also XMPP is really good in handling multiple end-points for a single
user. (There are good XMPP servers like ejabberd and tigase. Also good
lightweight JS libraries for handling connections).

PS: I think my reply is going off-topic. So, I will stop.

Regards,
Mahendra
On Thu, Jun 13, 2013 at 11:17 PM, Josh Foure <[EMAIL PROTECTED]> wrote:

> Hi Mahendra, I think that is where it gets a little tricky.  I think it
> would work something like this:
>
> 1.  Web sends login event for user "user123" to topic "GUEST_EVENT".
> 2.  All of the systems consume those messages and publish the data
> messages to topic "GUEST_DATA.user123".
> 3.  The Recommendation system gets all of the data from
> "GUEST_DATA.user123", processes and then publishes back to the same topic
> "GUEST_DATA.user123".
> 4.  The Web consumes the messages from the same topic (there is a
> different topic for every user that logged in) "GUEST_DATA.user123" and
> when it finds the recommendation messages it pushes that to the browser
> (note it will need to read all the other data messages and discard those
> when looking for the recommendation messages).  I have a concern that the
> Web will be flooded with a ton of messages that it will promptly drop but I
> don't want to create a new "response" or "recommendation" topic because
> then I feel like I am tightly coupling the message to the functionality and
> in the future different systems may want to consume those messages as well.
>
> Does that make sense?
> Josh
>
>
>
>
>
>
> ________________________________
>  From: Mahendra M <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Josh Foure <[EMAIL PROTECTED]>
> Sent: Thursday, June 13, 2013 12:56 PM
> Subject: Re: Using Kafka for "data" messages
>
>
> Hi Josh,
>
> The idea looks very interesting. I just had one doubt.
>
> 1. A user logs in. His login id is sent on a topic
> 2. Other systems (consumers on this topic) consumer this message and
> publish their results to another topic
>
> This will be happening without any particular order for hundreds of users.
>
> Now the site being displayed to the user.. How will you fetch only messages
> for that user from the queue?
>
> Regards,
> Mahendra
>
>
>
> On Thu, Jun 13, 2013 at 8:51 PM, Josh Foure <[EMAIL PROTECTED]> wrote:
>
> >
> > Hi all, my team is proposing a novel
> > way of using Kafka and I am hoping someone can help do a sanity check on
> > this:
> >
> > 1.  When a user logs
> > into our website, we will create a “logged in” event message in Kafka
> > containing the user id.
> > 2.  30+ systems
> > (consumers each in their own consumer groups) will consume this event and
> > lookup data about this user id.  They
> > will then publish all of this data back out into Kafka as a series of
> data
> > messages.  One message may include the user’s name,
> > another the user’s address, another the user’s last 10 searches, another
> > their
> > last 10 orders, etc.  The plan is that a
> > single “logged in” event may trigger hundreds if not thousands of

Mahendra

http://twitter.com/mahendra