-Re: Hadoop Real time help
Bertrand Dechoux 2012-08-20, 07:37
The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing
* CEP : http://en.wikipedia.org/wiki/Complex_event_processing
By the way, processing streams in real time tends toward being a pleonasm.
MapReduce follows a batch architecture. You keep data until a given time.
You then process everything. And at the end you provide all the results.
Stream processing has by definition a more 'smooth' throughput. Each event
is processed at a time and potentially each processing could lead to a
I don't know any complete overview of such tools.
Esper is well known in that space.
FlumeBase was an attempt to do something similar (as far as I can tell).
It shows how an ESP engine fits with log collection using a tool such as
Then you also have other solutions which will allow you to scale such as
A few people have already considered using Storm for scalability and Esper
to do the real computation.
On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
> Is there a "complete" overview of the tools that allow processing streams
> of data in realtime?
> Or even better; what are the terms to google for?
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <[EMAIL PROTECTED]> het
> That's a good question. More and more people are talking about Hadoop Real
>> One key aspect of this question is whether we are talking about MapReduce
>> or not.
>> MapReduce greatly improves the response time of any data intensive jobs
>> but it is still a batch framework with a noticeable latency.
>> There is multiple ways to improve the latency :
>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>> * Big Table clones (like HBase ...)
>> * YARN with a non MapReduce application
>> * ...
>> But it will really depend on the context and the definition of 'real
>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]>wrote:
>>> Hello folks,
>>> I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>> Bertrand Dechoux