|
|
+
Kartashov, Andy 2012-11-08, 14:35
+
Jay Vyas 2012-11-08, 14:49
-
Re: Hadoop processingMohammad Tariq 2012-11-08, 15:05
Hello Andy,
Just to add to what Mr. Jay has said, MR framework does its best to run the map task on a node where the input data is present. Sometimes, however, all the nodes(based on the replication factor) hosting the data block for a map task’s input split don't have any free slots. In that case, the job scheduler will look for a free map slot on a node in the same rack as one of the blocks. Very occasionally even this is not possible, so an off-rack node is used Regards, Mohammad Tariq On Thu, Nov 8, 2012 at 8:19 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > Hmm this is interesting. I think that: > > 1) For the map phases, hadoop is smart enough to try to run mappers > locally, but i think you could force these DNs to actively participate in a > Mapper job by decreasing the size of input splits, which would allow for > many more mappers, some of which would be forced to run on files which were > not necessarily local - in this scenario, those DNs don't yet have any > local files on them that would be used for the input. > > 2) For the reducer phases - since of course the reducers will be copying > mapper outputs from all over the cluster, one would expect that your Data > nodes would naturally take part in this portion of the task if the > num.reducers parameter was specified. > > > On Thu, Nov 8, 2012 at 9:35 AM, Kartashov, Andy <[EMAIL PROTECTED]>wrote: > >> Hadoopers, >> >> “Hadoop ships the code to the data instead of sending the data to the >> code.” >> >> Say you added two DNs/TTs to the cluster. They have no data at this >> point, i.e. you have not ran the balancer. >> >> In view of the above quoted statement, will these two nodes not >> participate in the MapReduce job until you balanced some data onto those >> nodes? Please kindly elaborate. >> >> >> >> Rgds, >> >> AK47 >> NOTICE: This e-mail message and any attachments are confidential, >> subject to copyright and may be privileged. Any unauthorized use, copying >> or disclosure is prohibited. If you are not the intended recipient, please >> delete and contact the sender immediately. Please consider the environment >> before printing this e-mail. AVIS : le présent courriel et toute pièce >> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur >> et peuvent être couverts par le secret professionnel. Toute utilisation, >> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le >> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement >> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent >> courriel >> > > > > -- > Jay Vyas > http://jayunit100.blogspot.com > +
Michael Segel 2012-11-08, 15:03
+
Kartashov, Andy 2012-11-08, 15:57
|