One other option is to use something like Druid, especially if you care
about doing arbitrary dimensional drilldowns.http://druid.io
It reads from Kafka and can do simple rollups for you automatically
(meaning you don't need storm if all you are doing with Storm is a simple
"group by" style rollup). If you need to do real-time joins of various
streams, it's fairly easy to use Storm to do that and push the data into
Druid as well.
Druid handles the delayed messages issue by allowing for a configurable
time window in which messages can be delayed (we run it with a 10 minute
window). Using Druid would be similar to Travis's setup, except it would
allow you to ingest the data in real-time and query the data as it is being
ingested, instead of having to wait for the persist to s3 and load into
Also, I'm biased about Druid ;).
On Fri, Aug 30, 2013 at 5:47 PM, Dan Di Spaltro <[EMAIL PROTECTED]>wrote: