Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> output from one map reduce job as the input to another map reduce job?


Copy link to this message
-
Re: output from one map reduce job as the input to another map reduce job?
Are you consider for this to Oozie? It´s a workflow engine developed for the
Yahoo! engineers
Yahoo/oozie at GitHub
https://github.com/yahoo/oozie

Oozie at InfoQ
http://www.infoq.com/articles/introductionOozie

Oozie´s examples:
http://www.infoq.com/articles/oozieexample
http://yahoo.github.com/oozie/releases/2.3.0/DG_Examples.html

Oozie at Cloudera
https://ccp.cloudera.com/display/CDHDOC/Oozie+Installation

Regards

2011/9/27 Arko Provo Mukherjee <[EMAIL PROTECTED]>

> Hi,
>
> I am not sure how you can avoid the filesystem, however, I did it as
> follows:
>
> // For Job 1
> FileInputFormat.addInputPath(job1, new Path(args[0]));
> FileOutputFormat.setOutputPath(job1, new Path(args[1]));
>
> // For job 2
> FileInputFormat.addInputPath(job2, new Path(args[1]));
> FileOutputFormat.setOutputPath(job2, new Path(args[2]));
>
> Assuming
> args[0] --> Input to first mapper
> args[1] --> Output of first reducer / Input to second mapper
> args[2] --> Out of second reducer
>
> Hope this helps!
> Warm regards
> Arko
>
> On Tue, Sep 27, 2011 at 2:09 PM, Kevin Burton <[EMAIL PROTECTED]> wrote:
> > Is it possible to connect the output of one map reduce job so that it is
> the
> > input to another map reduce job.
> > Basically… then reduce() outputs a key, that will be passed to another
> map()
> > function without having to store intermediate data to the filesystem.
> > Kevin
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> >
> > Location: San Francisco, CA
> > Skype: burtonator
> >
> > Skype-in: (415) 871-0687
> >
>

--
Marcos Luis Ortíz Valmaseda
 Linux Infrastructure Engineer
 Linux User # 418229
 http://marcosluis2186.posterous.com
 http://www.linkedin.com/in/marcosluis2186
 Twitter: @marcosluis2186
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB