Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Kafka in production at Clearspring


Copy link to this message
-
Re: Kafka in production at Clearspring
Chris Burroughs 2012-01-13, 21:13
Thanks!

There should be a few Kafka related blog posts from Clearspring coming soon.

On 01/13/2012 03:22 PM, Neha Narkhede wrote:
> Congratulations on this Chris ! Wondering if you would be writing a
> blog post on your experience of using Kafka for enabling live
> analytics ?
>
> Thanks
> Neha
>
> On Fri, Jan 13, 2012 at 11:46 AM, Chris Burroughs
> <[EMAIL PROTECTED]> wrote:
>> At Clearspring we have been using Apache Kafka since early 2011.  It
>> powers the AddThis Live View analytics [1] and the update [2] that
>> product recently received involved yet more Kafka (three cheers for the
>> log4j appender!).
>>
>> The project that we originally started investigating Kafka for is
>> somewhat larger; taking all of the view activity data generated by
>> AddThis sharing tools and replacing pixels on a CDN with direct request
>> to our datacenters. The obvious and exciting benefit is that this gives
>> us access to our data in seconds instead of waiting hours for access log
>> delivery.
>>
>> For that we have two datacenters, each with a web tier pushing to 60
>> Kafka servers (so 120 in total).  Between the two DCs we employ custom
>> bi-directional replication, so that batch and nearline analytics
>> processes have access to a full copy of the data.  We are receiving a
>> bit over 3 billion events per day, and expect total events ingested by
>> the system to grow briskly over the next year.
>>
>> One choice that appears somewhat unusual and might be notable is that
>> we are currently exclusively using the low level producer/consumers.
>> Each web server pushes to a local Kafka broker that it is co-located
>> with (we our fans of multi-tenancy where possible and didn't want two
>> different "kinds" of boxes, disk oblivious web services and sequential
>> io oriented kafka were a natural fit), and our consumers are all using
>> Clearspring's analytics system [3] which already had
>> integrated stream consumption and check-pointing.
>>
>> Please let me know if you have any questions.  There ought to be some
>> blog posts with more details in the coming weeks.
>>
>> [1]
>> http://www.addthis.com/blog/2011/06/21/social-data-in-real-time-with-addthis-live-view/
>>
>> [2]
>> http://www.addthis.com/blog/2011/12/20/expanded-addthis-analytics-now-available-in-live-view/
>>
>> [3] There are a few blog posts and presentations about analytics at
>> Clearspring floating around.  This one is the highest level overview:
>> http://www.clearspring.com/blog/2011/05/12/big-data-dc-analytics-at-clearspring/