Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop Real time help


Copy link to this message
-
Re: Hadoop Real time help
The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing
* CEP : http://en.wikipedia.org/wiki/Complex_event_processing

By the way, processing streams in real time tends toward being a pleonasm.

MapReduce follows a batch architecture. You keep data until a given time.
You then process everything. And at the end you provide all the results.
Stream processing has by definition a more 'smooth' throughput. Each event
is processed at a time and potentially each processing could lead to a
result.

I don't know any complete overview of such tools.
Esper is well known in that space.
FlumeBase was an attempt to do something similar (as far as I can tell).
It shows how an ESP engine fits with log collection using a tool such as
Flume.

Then you also have other solutions which will allow you to scale such as
Storm.
A few people have already considered using Storm for scalability and Esper
to do the real computation.

Regards

Bertrand

On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:

> Is there a "complete" overview of the tools that allow processing streams
> of data in realtime?
>
> Or even better; what are the terms to google for?
>
> --
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <[EMAIL PROTECTED]> het
> volgende:
>
> That's a good question. More and more people are talking about Hadoop Real
>> Time.
>> One key aspect of this question is whether we are talking about MapReduce
>> or not.
>>
>> MapReduce greatly improves the response time of any data intensive jobs
>> but it is still a batch framework with a noticeable latency.
>>
>> There is multiple ways to improve the latency :
>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>> * Big Table clones (like HBase ...)
>> * YARN with a non MapReduce application
>> * ...
>>
>> But it will really depend on the context and the definition of 'real
>> time'.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
--
Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB