Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop Real time help


Copy link to this message
-
Re: Hadoop Real time help
Thanks for the pointers, I have stuff to read now :)

On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
>>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>>
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <[EMAIL PROTECTED]> het
>> volgende:
>>
>>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about MapReduce
>>> or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]>
>>> wrote:
>>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>
>
>
>
> --
> Bertrand Dechoux

--
Best regards / Met vriendelijke groeten,

Niels Basjes