Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Hadoop Real time help


Copy link to this message
-
Re: Hadoop Real time help
Mohit Anchlia 2012-08-20, 16:53
One of the most commonly used use case is to perform all IO intensive batch
jobs in HDFS and load more structured data or the output of the job into
HBase or Solr for quick access. But if your dataset is small that fits into
memory then you could also cache it in memory. There are various options
depending on your requirements. Some of them Bertrand has already
highlighted below.

On Mon, Aug 20, 2012 at 12:37 AM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <[EMAIL PROTECTED]>
>> het volgende:
>>
>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about
>>> MapReduce or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>
>
> --
> Bertrand Dechoux
>