|
mahout user
2012-08-19, 15:44
Bertrand Dechoux
2012-08-19, 16:22
Mohit Anchlia
2012-08-19, 16:36
mahout user
2012-08-19, 19:06
Bertrand Dechoux
2012-08-19, 19:34
Niels Basjes
2012-08-19, 19:44
Bertrand Dechoux
2012-08-20, 07:37
Mohit Anchlia
2012-08-20, 16:53
Niels Basjes
2012-08-22, 18:21
|
-
Hadoop Real time helpmahout user 2012-08-19, 15:44
Hello folks,
I am new to hadoop, I just want to get information that how hadoop framework is usefull for real time service.?can any one explain me..? Thanks.
-
Re: Hadoop Real time helpBertrand Dechoux 2012-08-19, 16:22
That's a good question. More and more people are talking about Hadoop Real
Time. One key aspect of this question is whether we are talking about MapReduce or not. MapReduce greatly improves the response time of any data intensive jobs but it is still a batch framework with a noticeable latency. There is multiple ways to improve the latency : * ESP/CEP solutions (like Esper, FlumeBase, ...) * Big Table clones (like HBase ...) * YARN with a non MapReduce application * ... But it will really depend on the context and the definition of 'real time'. Regards Bertrand On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]> wrote: > Hello folks, > > > I am new to hadoop, I just want to get information that how hadoop > framework is usefull for real time service.?can any one explain me..? > > Thanks. > -- Bertrand Dechoux
-
Re: Hadoop Real time helpMohit Anchlia 2012-08-19, 16:36
On Sun, Aug 19, 2012 at 8:44 AM, mahout user <[EMAIL PROTECTED]> wrote:
> Hello folks, > > > I am new to hadoop, I just want to get information that how hadoop > framework is usefull for real time service.?can any one explain me..? > > Thanks. > Can you specify your use case? Each use case calls for different design consideration.
-
Re: Hadoop Real time helpmahout user 2012-08-19, 19:06
Thanks Mohit and Bertrand,
I am looking into hadoop for search engine as many others. But in case of search engine, I know lucene is there. But in my case i have implemented java classes, they are searching from databases as well as from csv files. But i cant understand if there are GB's of data is there, then how can i get real time search service with hadoop. ? On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > > > On Sun, Aug 19, 2012 at 8:44 AM, mahout user <[EMAIL PROTECTED]> wrote: > >> Hello folks, >> >> >> I am new to hadoop, I just want to get information that how hadoop >> framework is usefull for real time service.?can any one explain me..? >> >> Thanks. >> > > Can you specify your use case? Each use case calls for different design > consideration. >
-
Re: Hadoop Real time helpBertrand Dechoux 2012-08-19, 19:34
Lucene allows you to build a kind of inverted index "content to document
identifier". Solr or ElasticSearch allows to scale the process. However, if I am reading it correctly, you are saying that you can not pre compute a structure (such an index) before the search? If that's true and that you need to process GB of data, then you have to allow a latency, if you can not have everything in memory before the search itself. I can't say anything more precisely. It will depend on your context. One may ask : why can't you index the content of your database and your files? Bertrand On Sun, Aug 19, 2012 at 9:06 PM, mahout user <[EMAIL PROTECTED]> wrote: > Thanks Mohit and Bertrand, > > I am looking into hadoop for search engine as many others. But in > case of search engine, I know lucene is there. But in my case i have > implemented java classes, they are searching from databases as well as from > csv files. But i cant understand if there are GB's of data is there, then > how can i get real time search service with hadoop. ? > > > On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> >> >> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <[EMAIL PROTECTED]>wrote: >> >>> Hello folks, >>> >>> >>> I am new to hadoop, I just want to get information that how hadoop >>> framework is usefull for real time service.?can any one explain me..? >>> >>> Thanks. >>> >> >> Can you specify your use case? Each use case calls for different design >> consideration. >> > > -- Bertrand Dechoux
-
Re: Hadoop Real time helpNiels Basjes 2012-08-19, 19:44
Is there a "complete" overview of the tools that allow processing streams
of data in realtime? Or even better; what are the terms to google for? -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <[EMAIL PROTECTED]> het volgende: > That's a good question. More and more people are talking about Hadoop Real > Time. > One key aspect of this question is whether we are talking about MapReduce > or not. > > MapReduce greatly improves the response time of any data intensive jobs > but it is still a batch framework with a noticeable latency. > > There is multiple ways to improve the latency : > * ESP/CEP solutions (like Esper, FlumeBase, ...) > * Big Table clones (like HBase ...) > * YARN with a non MapReduce application > * ... > > But it will really depend on the context and the definition of 'real time'. > > Regards > > Bertrand > > > > On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]> wrote: > >> Hello folks, >> >> >> I am new to hadoop, I just want to get information that how hadoop >> framework is usefull for real time service.?can any one explain me..? >> >> Thanks. >> > > > > -- > Bertrand Dechoux >
-
Re: Hadoop Real time helpBertrand Dechoux 2012-08-20, 07:37
The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing * CEP : http://en.wikipedia.org/wiki/Complex_event_processing By the way, processing streams in real time tends toward being a pleonasm. MapReduce follows a batch architecture. You keep data until a given time. You then process everything. And at the end you provide all the results. Stream processing has by definition a more 'smooth' throughput. Each event is processed at a time and potentially each processing could lead to a result. I don't know any complete overview of such tools. Esper is well known in that space. FlumeBase was an attempt to do something similar (as far as I can tell). It shows how an ESP engine fits with log collection using a tool such as Flume. Then you also have other solutions which will allow you to scale such as Storm. A few people have already considered using Storm for scalability and Esper to do the real computation. Regards Bertrand On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <[EMAIL PROTECTED]> wrote: > Is there a "complete" overview of the tools that allow processing streams > of data in realtime? > > Or even better; what are the terms to google for? > > -- > Met vriendelijke groet, > Niels Basjes > (Verstuurd vanaf mobiel ) > Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <[EMAIL PROTECTED]> het > volgende: > > That's a good question. More and more people are talking about Hadoop Real >> Time. >> One key aspect of this question is whether we are talking about MapReduce >> or not. >> >> MapReduce greatly improves the response time of any data intensive jobs >> but it is still a batch framework with a noticeable latency. >> >> There is multiple ways to improve the latency : >> * ESP/CEP solutions (like Esper, FlumeBase, ...) >> * Big Table clones (like HBase ...) >> * YARN with a non MapReduce application >> * ... >> >> But it will really depend on the context and the definition of 'real >> time'. >> >> Regards >> >> Bertrand >> >> >> >> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]>wrote: >> >>> Hello folks, >>> >>> >>> I am new to hadoop, I just want to get information that how hadoop >>> framework is usefull for real time service.?can any one explain me..? >>> >>> Thanks. >>> >> >> >> >> -- >> Bertrand Dechoux >> > -- Bertrand Dechoux
-
Re: Hadoop Real time helpMohit Anchlia 2012-08-20, 16:53
One of the most commonly used use case is to perform all IO intensive batch
jobs in HDFS and load more structured data or the output of the job into HBase or Solr for quick access. But if your dataset is small that fits into memory then you could also cache it in memory. There are various options depending on your requirements. Some of them Bertrand has already highlighted below. On Mon, Aug 20, 2012 at 12:37 AM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote: > The terms are > * ESP : http://en.wikipedia.org/wiki/Event_stream_processing > * CEP : http://en.wikipedia.org/wiki/Complex_event_processing > > By the way, processing streams in real time tends toward being a pleonasm. > > MapReduce follows a batch architecture. You keep data until a given time. > You then process everything. And at the end you provide all the results. > Stream processing has by definition a more 'smooth' throughput. Each event > is processed at a time and potentially each processing could lead to a > result. > > I don't know any complete overview of such tools. > Esper is well known in that space. > FlumeBase was an attempt to do something similar (as far as I can tell). > It shows how an ESP engine fits with log collection using a tool such as > Flume. > > Then you also have other solutions which will allow you to scale such as > Storm. > A few people have already considered using Storm for scalability and Esper > to do the real computation. > > Regards > > Bertrand > > > On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <[EMAIL PROTECTED]> wrote: > >> Is there a "complete" overview of the tools that allow processing streams >> of data in realtime? >> >> Or even better; what are the terms to google for? >> >> -- >> Met vriendelijke groet, >> Niels Basjes >> (Verstuurd vanaf mobiel ) >> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <[EMAIL PROTECTED]> >> het volgende: >> >> That's a good question. More and more people are talking about Hadoop >>> Real Time. >>> One key aspect of this question is whether we are talking about >>> MapReduce or not. >>> >>> MapReduce greatly improves the response time of any data intensive jobs >>> but it is still a batch framework with a noticeable latency. >>> >>> There is multiple ways to improve the latency : >>> * ESP/CEP solutions (like Esper, FlumeBase, ...) >>> * Big Table clones (like HBase ...) >>> * YARN with a non MapReduce application >>> * ... >>> >>> But it will really depend on the context and the definition of 'real >>> time'. >>> >>> Regards >>> >>> Bertrand >>> >>> >>> >>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]>wrote: >>> >>>> Hello folks, >>>> >>>> >>>> I am new to hadoop, I just want to get information that how hadoop >>>> framework is usefull for real time service.?can any one explain me..? >>>> >>>> Thanks. >>>> >>> >>> >>> >>> -- >>> Bertrand Dechoux >>> >> > > > -- > Bertrand Dechoux >
-
Re: Hadoop Real time helpNiels Basjes 2012-08-22, 18:21
Thanks for the pointers, I have stuff to read now :)
On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote: > The terms are > * ESP : http://en.wikipedia.org/wiki/Event_stream_processing > * CEP : http://en.wikipedia.org/wiki/Complex_event_processing > > By the way, processing streams in real time tends toward being a pleonasm. > > MapReduce follows a batch architecture. You keep data until a given time. > You then process everything. And at the end you provide all the results. > Stream processing has by definition a more 'smooth' throughput. Each event > is processed at a time and potentially each processing could lead to a > result. > > I don't know any complete overview of such tools. > Esper is well known in that space. > FlumeBase was an attempt to do something similar (as far as I can tell). > It shows how an ESP engine fits with log collection using a tool such as > Flume. > > Then you also have other solutions which will allow you to scale such as > Storm. > A few people have already considered using Storm for scalability and Esper > to do the real computation. > > Regards > > Bertrand > > > On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <[EMAIL PROTECTED]> wrote: >> >> Is there a "complete" overview of the tools that allow processing streams >> of data in realtime? >> >> Or even better; what are the terms to google for? >> >> -- >> Met vriendelijke groet, >> Niels Basjes >> (Verstuurd vanaf mobiel ) >> >> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <[EMAIL PROTECTED]> het >> volgende: >> >>> That's a good question. More and more people are talking about Hadoop >>> Real Time. >>> One key aspect of this question is whether we are talking about MapReduce >>> or not. >>> >>> MapReduce greatly improves the response time of any data intensive jobs >>> but it is still a batch framework with a noticeable latency. >>> >>> There is multiple ways to improve the latency : >>> * ESP/CEP solutions (like Esper, FlumeBase, ...) >>> * Big Table clones (like HBase ...) >>> * YARN with a non MapReduce application >>> * ... >>> >>> But it will really depend on the context and the definition of 'real >>> time'. >>> >>> Regards >>> >>> Bertrand >>> >>> >>> >>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> Hello folks, >>>> >>>> >>>> I am new to hadoop, I just want to get information that how hadoop >>>> framework is usefull for real time service.?can any one explain me..? >>>> >>>> Thanks. >>> >>> >>> >>> >>> -- >>> Bertrand Dechoux > > > > > -- > Bertrand Dechoux -- Best regards / Met vriendelijke groeten, Niels Basjes |