Trying to understand how kafka could be used in the following scenerio:
Say I am creating a Saas application for website click tracking. So a
and any links click on their website would result in a api call that would
log the click (ip address, link meta data, timestamp, session guid, etc).
Since these api calls are coming from remote servers, I'm guessing I would
be wrapping the calls to kafka via a http server e.g. a jetty servlet
handler would take the http call made via the api and then write to a kafka
Am I right so far?
Now how could I partition the data in a way that would make consuming more
i.e. I am tracking click counts for visitors to a website, it would be
probable that a user will have multiple messages written to kafka in a
given session, so on the consumer end if I could read in batches and
aggregate before I write the 'rolled up' data to mysql that would be ideal.
I read the kafka design page, and I understand at a high level that
consumers can be 'grouped'.
Looking for someone to clarify how this usecase could be solved with kafka,
how partitioning and consumption works (still not 100% clear on those and
hopefully this sample use case will clear that up).
Neha Narkhede 2012-05-02, 16:31
S Ahmed 2012-05-02, 17:55
Neha Narkhede 2012-05-02, 19:13