Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Multi-stage map/reduce jobs


Copy link to this message
-
Multi-stage map/reduce jobs
It's not clear to me how to stitch together multiple map reduce jobs.  Without using cascading or something else like it, is the method basically to write to a intermediate spot, and have the next stage read from there?

If so, how are jobs responsible for cleaning up the temp/intermediate data they create?  What happens if stage 1 completes, and state 2 doesn't, do the stage 1 files get left around?

Does anyone have some insight they could share?

Thanks.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB