Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Multi-stage map/reduce jobs


+
Bertrand Dechoux 2012-11-24, 11:56
+
Sean McNamara 2012-11-23, 22:22
Copy link to this message
-
Re: Multi-stage map/reduce jobs
Hadoop is not an API for orchestrating mapreduce jobs- fortunately, there is no need for such an API.  Each mapreduce job can simple be run like a normal java class.

So, to run multiple mapreduce jobs?

Easy- you create a main()[] method in a single class which runs each job individually by invoking each job separately, using the waitForCompletion() method which blocks until a job completes.  

..this method will block until each individual job completes.

Jay Vyas
http://jayunit100.blogspot.com

On Nov 23, 2012, at 5:22 PM, Sean McNamara <[EMAIL PROTECTED]> wrote:

> It's not clear to me how to stitch together multiple map reduce jobs.  Without using cascading or something else like it, is the method basically to write to a intermediate spot, and have the next stage read from there?
>
> If so, how are jobs responsible for cleaning up the temp/intermediate data they create?  What happens if stage 1 completes, and state 2 doesn't, do the stage 1 files get left around?
>
> Does anyone have some insight they could share?
>
> Thanks.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB