Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Stream data processing

Zhiwei Lin 2012-05-21, 20:01
Copy link to this message
Re: Stream data processing

How quickly do you have to get the result out once the new data is added?  How far back in time do you have to look for BBBB from the occurrence of bbbb?  Do you have to do this for all combinations of values or is it just a small subset of values?

--Bobby Evans

On 5/21/12 3:01 PM, "Zhiwei Lin" <[EMAIL PROTECTED]> wrote:

I have large volume of stream log data. Each data record contains a time
stamp, which is very important to the analysis.
For example, I have data format like this:
(1) 20:30:21 01/April/2012    AAAAA.............
(2) 20:30:51 01/April/2012    BBBB.............
(3) 21:30:21 01/April/2012    bbbb.............

Moreover, new data comes every few minutes.
I have to calculate the probability of the occurrence "bbbb" given the
occurrence of "BBBB" (where BBBB occurs earlier than bbbb). So, it is
really time-dependant.

I wonder if Hadoop  is the right platform for this job? Is there any
package available for this kind of work?

Thank you.


Zhiwei Lin 2012-05-22, 10:02
Robert Evans 2012-05-22, 13:52
Zhiwei Lin 2012-05-22, 13:58