|
|
-
Is continuous map reduce supported
Stephen Mullins 2010-08-24, 16:55
Hello,
I have not used Hadoop but am researching it for an analytics project. I would like to know if Hadoop supports continuous or incremental map reduce functionality. If not, are there any plans to add it?
Thanks, Stephen
+
Stephen Mullins 2010-08-24, 16:55
-
Re: Is continuous map reduce supported
Harsh J 2010-08-24, 18:05
There's Chain-Mapping and Chain-Reducing available. With good docs: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainReducer.htmlHowever, something as simple as Twister (which has iterative mapreduces based on a while-like condition loop) isn't directly available. One sometimes needs to chain jobs together to achieve this in pure-Hadoop. Projects like Hive, Pig, and Cascading help with this a bit (plan building, optimization of plan, execution, etc.). On Tue, Aug 24, 2010 at 10:25 PM, Stephen Mullins <[EMAIL PROTECTED]> wrote: > Hello, > > I have not used Hadoop but am researching it for an analytics project. I > would like to know if Hadoop supports continuous or incremental map reduce > functionality. If not, are there any plans to add it? > > Thanks, > Stephen > -- Harsh J www.harshj.com
+
Harsh J 2010-08-24, 18:05
-
Re: Is continuous map reduce supported
Jeff Hammerbacher 2010-09-01, 10:35
Hey Stephen, There have been several proposals for implementing such a feature. See https://issues.apache.org/jira/browse/MAPREDUCE-1211 for an implementation from Berkeley, now maintained at http://code.google.com/p/hop. The paper at https://www.ideals.illinois.edu/handle/2142/14819 describes a similar approach. Incremental bulk processing is another approach. See http://doi.acm.org/10.1145/1807128.1807138 for a system built on top of Hadoop, and http://research.microsoft.com/apps/pubs/default.aspx?id=117830fora system built on top of Dryad. The blog post at http://clue.cs.washington.edu/node/14 describes a paper accepted at VLDB this year which improves the performance of Hadoop MapReduce for iterative tasks, and may be applicable to your research. Lastly, for more CEP-like approaches, you can check out C-MR from Brown ( ftp://ftp.cs.brown.edu/pub/techreports/10/cs10-01.pdf) and Continuous MapReduce from UCSD: http://www.christrezzo.com/ctrezzo-thesis.pdf. As for actually being implemented in Hadoop MapReduce: the Apache project seems to have settled in to focus on stability rather than evolving new features. Thanks, Jeff On Tue, Aug 24, 2010 at 11:05 AM, Harsh J <[EMAIL PROTECTED]> wrote: > There's Chain-Mapping and Chain-Reducing available. With good docs: > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainReducer.html> > However, something as simple as Twister (which has iterative > mapreduces based on a while-like condition loop) isn't directly > available. One sometimes needs to chain jobs together to achieve this > in pure-Hadoop. > > Projects like Hive, Pig, and Cascading help with this a bit (plan > building, optimization of plan, execution, etc.). > > On Tue, Aug 24, 2010 at 10:25 PM, Stephen Mullins <[EMAIL PROTECTED]> > wrote: > > Hello, > > > > I have not used Hadoop but am researching it for an analytics project. I > > would like to know if Hadoop supports continuous or incremental map > reduce > > functionality. If not, are there any plans to add it? > > > > Thanks, > > Stephen > > > > > > -- > Harsh J > www.harshj.com >
+
Jeff Hammerbacher 2010-09-01, 10:35
-
Re: Is continuous map reduce supported
Owen O'Malley 2010-09-01, 16:56
On Wed, Sep 1, 2010 at 3:35 AM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: > > There have been several proposals for implementing such a feature. Thanks for the great breakdown of the relevant work. > As for actually being implemented in Hadoop MapReduce: the Apache project > seems to have settled in to focus on stability rather than evolving new > features. That isn't true. We are actively adding new features. However, there is certainly a focus on doing MapReduce well rather than trying to implement all potential distributed computation paradigms. I suspect that the right solution is doing two levels like Mesos: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-87.htmlWhere MapReduce is one framework running on the cluster. That way you can keep the framework stable, support MapReduce well, and let other users explore other idioms. -- Owen
+
Owen O'Malley 2010-09-01, 16:56
-
Re: Is continuous map reduce supported
Lance Norskog 2010-09-02, 03:33
Dead link: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainReducer.htmlOn Wed, Sep 1, 2010 at 9:56 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > On Wed, Sep 1, 2010 at 3:35 AM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: >> >> There have been several proposals for implementing such a feature. > > Thanks for the great breakdown of the relevant work. > >> As for actually being implemented in Hadoop MapReduce: the Apache project >> seems to have settled in to focus on stability rather than evolving new >> features. > > That isn't true. We are actively adding new features. However, there > is certainly a focus on doing MapReduce well rather than trying to > implement all potential distributed computation paradigms. I suspect > that the right solution is doing two levels like Mesos: > > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-87.html> > Where MapReduce is one framework running on the cluster. That way you > can keep the framework stable, support MapReduce well, and let other > users explore other idioms. > > -- Owen > -- Lance Norskog [EMAIL PROTECTED]
+
Lance Norskog 2010-09-02, 03:33
-
Re: Is continuous map reduce supported
Ted Yu 2010-09-02, 03:48
Try this: http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/ChainReducer.htmlOn Wed, Sep 1, 2010 at 8:33 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Dead link: > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainReducer.html> > On Wed, Sep 1, 2010 at 9:56 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Wed, Sep 1, 2010 at 3:35 AM, Jeff Hammerbacher <[EMAIL PROTECTED]> > wrote: > >> > >> There have been several proposals for implementing such a feature. > > > > Thanks for the great breakdown of the relevant work. > > > >> As for actually being implemented in Hadoop MapReduce: the Apache > project > >> seems to have settled in to focus on stability rather than evolving new > >> features. > > > > That isn't true. We are actively adding new features. However, there > > is certainly a focus on doing MapReduce well rather than trying to > > implement all potential distributed computation paradigms. I suspect > > that the right solution is doing two levels like Mesos: > > > > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-87.html> > > > Where MapReduce is one framework running on the cluster. That way you > > can keep the framework stable, support MapReduce well, and let other > > users explore other idioms. > > > > -- Owen > > > > > > -- > Lance Norskog > [EMAIL PROTECTED] >
+
Ted Yu 2010-09-02, 03:48
-
Re: Is continuous map reduce supported
Jeff Hammerbacher 2010-09-03, 05:13
> That isn't true. We are actively adding new features. However, there > is certainly a focus on doing MapReduce well rather than trying to > implement all potential distributed computation paradigms. I suspect > that the right solution is doing two levels like Mesos: > > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-87.html> > Where MapReduce is one framework running on the cluster. That way you > can keep the framework stable, support MapReduce well, and let other > users explore other idioms. > I completely agree. I think it's a good thing that Hadoop MapReduce is focused on doing MapReduce well and leaving experimentation to alternative frameworks which could run on the same cluster as MapReduce via a system like Mesos.
+
Jeff Hammerbacher 2010-09-03, 05:13
-
Re: Is continuous map reduce supported
Wei Xue 2010-09-03, 06:01
Thanks Jeff. Those are all valuable links. It seems there are quite a few people out there working on incremental MapReduce. 2010/9/1 Jeff Hammerbacher <[EMAIL PROTECTED]> > Hey Stephen, > > There have been several proposals for implementing such a feature. See > https://issues.apache.org/jira/browse/MAPREDUCE-1211 for an implementation > from Berkeley, now maintained at http://code.google.com/p/hop. The paper > at https://www.ideals.illinois.edu/handle/2142/14819 describes a similar > approach. > > Incremental bulk processing is another approach. See > http://doi.acm.org/10.1145/1807128.1807138 for a system built on top of > Hadoop, and http://research.microsoft.com/apps/pubs/default.aspx?id=117830for a system built on top of Dryad. > > The blog post at http://clue.cs.washington.edu/node/14 describes a paper > accepted at VLDB this year which improves the performance of Hadoop > MapReduce for iterative tasks, and may be applicable to your research. > > Lastly, for more CEP-like approaches, you can check out C-MR from Brown ( > ftp://ftp.cs.brown.edu/pub/techreports/10/cs10-01.pdf) and Continuous > MapReduce from UCSD: http://www.christrezzo.com/ctrezzo-thesis.pdf. > > As for actually being implemented in Hadoop MapReduce: the Apache project > seems to have settled in to focus on stability rather than evolving new > features. > > Thanks, > Jeff > > > On Tue, Aug 24, 2010 at 11:05 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> There's Chain-Mapping and Chain-Reducing available. With good docs: >> >> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainReducer.html>> >> However, something as simple as Twister (which has iterative >> mapreduces based on a while-like condition loop) isn't directly >> available. One sometimes needs to chain jobs together to achieve this >> in pure-Hadoop. >> >> Projects like Hive, Pig, and Cascading help with this a bit (plan >> building, optimization of plan, execution, etc.). >> >> On Tue, Aug 24, 2010 at 10:25 PM, Stephen Mullins <[EMAIL PROTECTED]> >> wrote: >> > Hello, >> > >> > I have not used Hadoop but am researching it for an analytics project. I >> > would like to know if Hadoop supports continuous or incremental map >> reduce >> > functionality. If not, are there any plans to add it? >> > >> > Thanks, >> > Stephen >> > >> >> >> >> -- >> Harsh J >> www.harshj.com >> > >
+
Wei Xue 2010-09-03, 06:01
|
|