Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> kafka for user click tracking, how would this work?


Copy link to this message
-
kafka for user click tracking, how would this work?
Trying to understand how kafka could be used in the following scenerio:

Say I am creating a Saas application for website click tracking.  So a
client would paste some javascript on their website,
and any links click on their website would result in a api call that would
log the click (ip address, link meta data, timestamp, session guid, etc).

Since these api calls are coming from remote servers, I'm guessing I would
be wrapping the calls to kafka via a http server e.g. a jetty servlet
handler would take the http call made via the api and then write to a kafka
topic.

Am I right so far?

Now how could I partition the data in a way that would make consuming more
efficient?
i.e. I am tracking click counts for visitors to a website, it would be
probable that a user will have multiple messages written to kafka in a
given session, so on the consumer end if I could read in batches and
aggregate before I write the 'rolled up' data to mysql that would be ideal.

I read the kafka design page, and I understand at a high level that
consumers can be 'grouped'.

Looking for someone to clarify how this usecase could be solved with kafka,
particularly
how partitioning and consumption works (still not 100% clear on those and
hopefully this sample use case will clear that up).
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB