Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> how to implement post-mapper processing

Copy link to this message
Re: how to implement post-mapper processing
On 08/25/2010 10:36 AM, Anfernee Xu wrote:
> Thanks all for your help.
> The challenge is that suppose I have 4 datanodes in cluster, but for a given
> input, I have 2 splits, therefore only 2 nodes out of 4 will run M/R job,
> say nodeA and nodeB, after the job completes, the data from input has been
> stored in datastore on nodeA and nodeB, nodeC and nodeD are intact at this
> moment, for now I need to run a post-processing on nodeA and nodeB to get my
> data ready, originally I think I can have another M/R job also with 2
> splits, but I cannot tell which node will be selected to run these splits, I
> expected the same nodes will be selected.
> Anfernee

Well then you could put your post-processing in Mapper.cleanup.