There should be a few Kafka related blog posts from Clearspring coming soon.
On 01/13/2012 03:22 PM, Neha Narkhede wrote:
> Congratulations on this Chris ! Wondering if you would be writing a
> blog post on your experience of using Kafka for enabling live
> analytics ?
> On Fri, Jan 13, 2012 at 11:46 AM, Chris Burroughs
> <[EMAIL PROTECTED]> wrote:
>> At Clearspring we have been using Apache Kafka since early 2011. It
>> powers the AddThis Live View analytics  and the update  that
>> product recently received involved yet more Kafka (three cheers for the
>> log4j appender!).
>> The project that we originally started investigating Kafka for is
>> somewhat larger; taking all of the view activity data generated by
>> AddThis sharing tools and replacing pixels on a CDN with direct request
>> to our datacenters. The obvious and exciting benefit is that this gives
>> us access to our data in seconds instead of waiting hours for access log
>> For that we have two datacenters, each with a web tier pushing to 60
>> Kafka servers (so 120 in total). Between the two DCs we employ custom
>> bi-directional replication, so that batch and nearline analytics
>> processes have access to a full copy of the data. We are receiving a
>> bit over 3 billion events per day, and expect total events ingested by
>> the system to grow briskly over the next year.
>> One choice that appears somewhat unusual and might be notable is that
>> we are currently exclusively using the low level producer/consumers.
>> Each web server pushes to a local Kafka broker that it is co-located
>> with (we our fans of multi-tenancy where possible and didn't want two
>> different "kinds" of boxes, disk oblivious web services and sequential
>> io oriented kafka were a natural fit), and our consumers are all using
>> Clearspring's analytics system  which already had
>> integrated stream consumption and check-pointing.
>> Please let me know if you have any questions. There ought to be some
>> blog posts with more details in the coming weeks.
>>  There are a few blog posts and presentations about analytics at
>> Clearspring floating around. This one is the highest level overview: