Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Chaining MapReduce Jobs


Copy link to this message
-
Chaining MapReduce Jobs
Hello,

I would like to run an Hadoop program which is composed by
Map1-Red1->Map2-Red2->Map3-Red3. I've read "Hadoop in Action" and several
articles online, but all of them are either based on API <= 0.20 or they
have just few lines of code.

I'm working with Hadoop 1.0.3 and I think the best solution is to use
JobControl class, but I haven't found one good example for that.

In my particular application the MapReduce Jobs are executed in sequence,
so it could be possible to run the first job, then the second and finally
the third one. The problem is that I need to set the input and output
directory for the second but it doesn't make sense because I should link
the output of job1 with the input of job2 and I don't know how to do that.

Any suggestion or resource to solve this problem? Even a source code in
github is good.

Thanks
Claudio
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB