-Chaining MapReduce Jobs
Claudio Reggiani 2012-11-08, 13:03
I would like to run an Hadoop program which is composed by
Map1-Red1->Map2-Red2->Map3-Red3. I've read "Hadoop in Action" and several
articles online, but all of them are either based on API <= 0.20 or they
have just few lines of code.
I'm working with Hadoop 1.0.3 and I think the best solution is to use
JobControl class, but I haven't found one good example for that.
In my particular application the MapReduce Jobs are executed in sequence,
so it could be possible to run the first job, then the second and finally
the third one. The problem is that I need to set the input and output
directory for the second but it doesn't make sense because I should link
the output of job1 with the input of job2 and I don't know how to do that.
Any suggestion or resource to solve this problem? Even a source code in
github is good.