Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - how to implement post-mapper processing


Copy link to this message
-
Re: how to implement post-mapper processing
David Rosenstrauch 2010-08-25, 15:08
On 08/25/2010 10:36 AM, Anfernee Xu wrote:
> Thanks all for your help.
>
> The challenge is that suppose I have 4 datanodes in cluster, but for a given
> input, I have 2 splits, therefore only 2 nodes out of 4 will run M/R job,
> say nodeA and nodeB, after the job completes, the data from input has been
> stored in datastore on nodeA and nodeB, nodeC and nodeD are intact at this
> moment, for now I need to run a post-processing on nodeA and nodeB to get my
> data ready, originally I think I can have another M/R job also with 2
> splits, but I cannot tell which node will be selected to run these splits, I
> expected the same nodes will be selected.
>
> Anfernee

Well then you could put your post-processing in Mapper.cleanup.

http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup%28org.apache.hadoop.mapreduce.Mapper.Context%29

DR