Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> how to implement post-mapper processing


Copy link to this message
-
Re: how to implement post-mapper processing
On 08/25/2010 10:36 AM, Anfernee Xu wrote:
> Thanks all for your help.
>
> The challenge is that suppose I have 4 datanodes in cluster, but for a given
> input, I have 2 splits, therefore only 2 nodes out of 4 will run M/R job,
> say nodeA and nodeB, after the job completes, the data from input has been
> stored in datastore on nodeA and nodeB, nodeC and nodeD are intact at this
> moment, for now I need to run a post-processing on nodeA and nodeB to get my
> data ready, originally I think I can have another M/R job also with 2
> splits, but I cannot tell which node will be selected to run these splits, I
> expected the same nodes will be selected.
>
> Anfernee

Well then you could put your post-processing in Mapper.cleanup.

http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup%28org.apache.hadoop.mapreduce.Mapper.Context%29

DR
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB