Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> kafka for user click tracking, how would this work?


Copy link to this message
-
kafka for user click tracking, how would this work?
Trying to understand how kafka could be used in the following scenerio:

Say I am creating a Saas application for website click tracking.  So a
client would paste some javascript on their website,
and any links click on their website would result in a api call that would
log the click (ip address, link meta data, timestamp, session guid, etc).

Since these api calls are coming from remote servers, I'm guessing I would
be wrapping the calls to kafka via a http server e.g. a jetty servlet
handler would take the http call made via the api and then write to a kafka
topic.

Am I right so far?

Now how could I partition the data in a way that would make consuming more
efficient?
i.e. I am tracking click counts for visitors to a website, it would be
probable that a user will have multiple messages written to kafka in a
given session, so on the consumer end if I could read in batches and
aggregate before I write the 'rolled up' data to mysql that would be ideal.

I read the kafka design page, and I understand at a high level that
consumers can be 'grouped'.

Looking for someone to clarify how this usecase could be solved with kafka,
particularly
how partitioning and consumption works (still not 100% clear on those and
hopefully this sample use case will clear that up).