Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka in production at Clearspring

Copy link to this message
Kafka in production at Clearspring
At Clearspring we have been using Apache Kafka since early 2011.  It
powers the AddThis Live View analytics [1] and the update [2] that
product recently received involved yet more Kafka (three cheers for the
log4j appender!).

The project that we originally started investigating Kafka for is
somewhat larger; taking all of the view activity data generated by
AddThis sharing tools and replacing pixels on a CDN with direct request
to our datacenters. The obvious and exciting benefit is that this gives
us access to our data in seconds instead of waiting hours for access log

For that we have two datacenters, each with a web tier pushing to 60
Kafka servers (so 120 in total).  Between the two DCs we employ custom
bi-directional replication, so that batch and nearline analytics
processes have access to a full copy of the data.  We are receiving a
bit over 3 billion events per day, and expect total events ingested by
the system to grow briskly over the next year.

One choice that appears somewhat unusual and might be notable is that
we are currently exclusively using the low level producer/consumers.
Each web server pushes to a local Kafka broker that it is co-located
with (we our fans of multi-tenancy where possible and didn't want two
different "kinds" of boxes, disk oblivious web services and sequential
io oriented kafka were a natural fit), and our consumers are all using
Clearspring's analytics system [3] which already had
integrated stream consumption and check-pointing.

Please let me know if you have any questions.  There ought to be some
blog posts with more details in the coming weeks.



[3] There are a few blog posts and presentations about analytics at
Clearspring floating around.  This one is the highest level overview: