On 08/25/2010 10:36 AM, Anfernee Xu wrote:
> Thanks all for your help.
> The challenge is that suppose I have 4 datanodes in cluster, but for a given
> input, I have 2 splits, therefore only 2 nodes out of 4 will run M/R job,
> say nodeA and nodeB, after the job completes, the data from input has been
> stored in datastore on nodeA and nodeB, nodeC and nodeD are intact at this
> moment, for now I need to run a post-processing on nodeA and nodeB to get my
> data ready, originally I think I can have another M/R job also with 2
> splits, but I cannot tell which node will be selected to run these splits, I
> expected the same nodes will be selected.
Well then you could put your post-processing in Mapper.cleanup.