Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Multi-stage map/reduce jobs


Copy link to this message
-
Re: Multi-stage map/reduce jobs
Bertrand Dechoux 2012-11-24, 11:56
I will second Harsh about JobControl.

It is indeed not the role of Hadoop to provide a full workflow engine in
its core but JobControl allows you to define a graph of dependent jobs and
run them as one from a programmatic point of vue. Of course, if you were to
compare it to cascade in Cascading, you would be responsible for cleaning
'temporary' results and build your own 'results cache'.

http://hadoop.apache.org/docs/r1.0.4/api/index.html?org/apache/hadoop/mapred/jobcontrol/JobControl.html

Bertrand

On Sat, Nov 24, 2012 at 8:27 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> You probably want something like Oozie which provides DAG-like flows
> for jobs, so you can easily write in "upon-failure" and "upon-success"
> form of conditions, aside of incorporating complex logic as well.
>
> Otherwise, I guess you could do what Jay has suggested, or look at the
> JobControl classes to avoid some of the extra work needed.
>
> On Sat, Nov 24, 2012 at 3:52 AM, Sean McNamara
> <[EMAIL PROTECTED]> wrote:
> > It's not clear to me how to stitch together multiple map reduce jobs.
> > Without using cascading or something else like it, is the method
> basically
> > to write to a intermediate spot, and have the next stage read from there?
> >
> > If so, how are jobs responsible for cleaning up the temp/intermediate
> data
> > they create?  What happens if stage 1 completes, and state 2 doesn't, do
> the
> > stage 1 files get left around?
> >
> > Does anyone have some insight they could share?
> >
> > Thanks.
>
>
>
> --
> Harsh J
>

--
Bertrand Dechoux