Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Multi-stage map/reduce jobs


Copy link to this message
-
Re: Multi-stage map/reduce jobs
You probably want something like Oozie which provides DAG-like flows
for jobs, so you can easily write in "upon-failure" and "upon-success"
form of conditions, aside of incorporating complex logic as well.

Otherwise, I guess you could do what Jay has suggested, or look at the
JobControl classes to avoid some of the extra work needed.

On Sat, Nov 24, 2012 at 3:52 AM, Sean McNamara
<[EMAIL PROTECTED]> wrote:
> It's not clear to me how to stitch together multiple map reduce jobs.
> Without using cascading or something else like it, is the method basically
> to write to a intermediate spot, and have the next stage read from there?
>
> If so, how are jobs responsible for cleaning up the temp/intermediate data
> they create?  What happens if stage 1 completes, and state 2 doesn't, do the
> stage 1 files get left around?
>
> Does anyone have some insight they could share?
>
> Thanks.

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB