Hmm this is interesting. I think that:
1) For the map phases, hadoop is smart enough to try to run mappers
locally, but i think you could force these DNs to actively participate in a
Mapper job by decreasing the size of input splits, which would allow for
many more mappers, some of which would be forced to run on files which were
not necessarily local - in this scenario, those DNs don't yet have any
local files on them that would be used for the input.
2) For the reducer phases - since of course the reducers will be copying
mapper outputs from all over the cluster, one would expect that your Data
nodes would naturally take part in this portion of the task if the
num.reducers parameter was specified.
On Thu, Nov 8, 2012 at 9:35 AM, Kartashov, Andy <[EMAIL PROTECTED]>wrote:
> “Hadoop ships the code to the data instead of sending the data to the
> Say you added two DNs/TTs to the cluster. They have no data at this point,
> i.e. you have not ran the balancer.
> In view of the above quoted statement, will these two nodes not
> participate in the MapReduce job until you balanced some data onto those
> nodes? Please kindly elaborate.
> NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent