|
|
-
Chaining MapReduce Jobs
Claudio Reggiani 2012-11-08, 13:03
Hello,
I would like to run an Hadoop program which is composed by Map1-Red1->Map2-Red2->Map3-Red3. I've read "Hadoop in Action" and several articles online, but all of them are either based on API <= 0.20 or they have just few lines of code.
I'm working with Hadoop 1.0.3 and I think the best solution is to use JobControl class, but I haven't found one good example for that.
In my particular application the MapReduce Jobs are executed in sequence, so it could be possible to run the first job, then the second and finally the third one. The problem is that I need to set the input and output directory for the second but it doesn't make sense because I should link the output of job1 with the input of job2 and I don't know how to do that.
Any suggestion or resource to solve this problem? Even a source code in github is good.
Thanks Claudio
+
Claudio Reggiani 2012-11-08, 13:03
-
Re: Chaining MapReduce Jobs
Michael Segel 2012-11-08, 19:12
Have you looked at the ToolRunner class?
On Nov 8, 2012, at 7:03 AM, Claudio Reggiani <[EMAIL PROTECTED]> wrote:
> Hello, > > I would like to run an Hadoop program which is composed by > Map1-Red1->Map2-Red2->Map3-Red3. I've read "Hadoop in Action" and several > articles online, but all of them are either based on API <= 0.20 or they > have just few lines of code. > > I'm working with Hadoop 1.0.3 and I think the best solution is to use > JobControl class, but I haven't found one good example for that. > > In my particular application the MapReduce Jobs are executed in sequence, > so it could be possible to run the first job, then the second and finally > the third one. The problem is that I need to set the input and output > directory for the second but it doesn't make sense because I should link > the output of job1 with the input of job2 and I don't know how to do that. > > Any suggestion or resource to solve this problem? Even a source code in > github is good. > > Thanks > Claudio
+
Michael Segel 2012-11-08, 19:12
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext