Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka in production at Clearspring


Copy link to this message
-
Re: Kafka in production at Clearspring
Thanks!

There should be a few Kafka related blog posts from Clearspring coming soon.

On 01/13/2012 03:22 PM, Neha Narkhede wrote:
> Congratulations on this Chris ! Wondering if you would be writing a
> blog post on your experience of using Kafka for enabling live
> analytics ?
>
> Thanks
> Neha
>
> On Fri, Jan 13, 2012 at 11:46 AM, Chris Burroughs
> <[EMAIL PROTECTED]> wrote:
>> At Clearspring we have been using Apache Kafka since early 2011.  It
>> powers the AddThis Live View analytics [1] and the update [2] that
>> product recently received involved yet more Kafka (three cheers for the
>> log4j appender!).
>>
>> The project that we originally started investigating Kafka for is
>> somewhat larger; taking all of the view activity data generated by
>> AddThis sharing tools and replacing pixels on a CDN with direct request
>> to our datacenters. The obvious and exciting benefit is that this gives
>> us access to our data in seconds instead of waiting hours for access log
>> delivery.
>>
>> For that we have two datacenters, each with a web tier pushing to 60
>> Kafka servers (so 120 in total).  Between the two DCs we employ custom
>> bi-directional replication, so that batch and nearline analytics
>> processes have access to a full copy of the data.  We are receiving a
>> bit over 3 billion events per day, and expect total events ingested by
>> the system to grow briskly over the next year.
>>
>> One choice that appears somewhat unusual and might be notable is that
>> we are currently exclusively using the low level producer/consumers.
>> Each web server pushes to a local Kafka broker that it is co-located
>> with (we our fans of multi-tenancy where possible and didn't want two
>> different "kinds" of boxes, disk oblivious web services and sequential
>> io oriented kafka were a natural fit), and our consumers are all using
>> Clearspring's analytics system [3] which already had
>> integrated stream consumption and check-pointing.
>>
>> Please let me know if you have any questions.  There ought to be some
>> blog posts with more details in the coming weeks.
>>
>> [1]
>> http://www.addthis.com/blog/2011/06/21/social-data-in-real-time-with-addthis-live-view/
>>
>> [2]
>> http://www.addthis.com/blog/2011/12/20/expanded-addthis-analytics-now-available-in-live-view/
>>
>> [3] There are a few blog posts and presentations about analytics at
>> Clearspring floating around.  This one is the highest level overview:
>> http://www.clearspring.com/blog/2011/05/12/big-data-dc-analytics-at-clearspring/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB