Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Event processing use case/examples

Copy link to this message
Event processing use case/examples
Mark 2011-11-04, 16:27
I am struggling on some core design concepts and I was hoping someone
could explaining how they use Kafka in their production event for event
processing. For example, I've read that LinkedIn has over 60+ metrics
they collect and aggregate.. ie page views, clicks etc. I clearly grasp
the concept of logging  a page view event to Kafka, but I'm missing the
last part. How does one go about aggregating this data and using it any
other way than a simple data sink.

Taking the "page_view" example further. What is the preferred way of
logging and consuming this event?  Would you have a consumer that just
consumes page views? If so, how do you go about making sure you dont
reconsume the same message in the event of a conusmer restart? Also for
analytical/reporting needs how do you deal with timeframes? Say my
consumer is subscribe to the "page_view" topic and I want all messages
from 8am-9am. Would I read all messages and filter out any that doesn't
have a specific timestamp, or would I create very a seperate topic for
each hour.. ie "page_view/08:00".  Same question applies to importing
all "page_views" for yesterday into Hadoop.

I know Kafka is a new project and im sure everyones time is constrained
but I think it would be helpful if some high level examples/use cases
and best practices were added to the wiki. This could help gain adoption
and hopeful bring in a more willing contributors :)

Thanks for your help