I have an application where I will be getting some Time Series data which I
am feeding to Kafka and Kafka in turn is giving data to Storm for doing
some real time processing.

Now one of my use case is that there might be certain lag in my data. For
an example: I might not get all the data for 2:00:00 PM all together. There
is a possibility that say all the data for 2:00:00 PM does not arrive at a
time and the application has to wait for all the data to arrive to perform
certain analytics.

For example, say at 2:00:00 pm I get 990 points and another 10 points (say
I know beforehand that there would be 1000 points of data per millisecond)
arrive at 2:00:40 PM. Now I have to wait for all the data to arrive to
perform analytics.

Where should I place my application logic: (1) In Kafka, (2) In Storm or
should I use something like Redis to get all the timestamped data and when
I get all the points for a particular time than only I give it to

I am confused :) Any help would be appreciated. Sorry for any grammatical
errors as I just was thinking aloud and jotting down my question.


NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB