|
|
+
rshepherd 2012-12-03, 18:08
+
Mostafa Elhemali 2012-12-03, 18:26
-
Re: Question about intermediate kv pair filesrshepherd 2012-12-03, 18:28
Thanks Mostafa! Very much appreciated.
On 12/3/12 1:26 PM, Mostafa Elhemali wrote: > (Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully > the list will correct any details I get wrong) > > In Hadoop 1: the mapper would put the file in a well-known location on the > machine (encoded by user, job ID and map ID) then TaskTracker would serve > it over HTTP to the reducer when it requests it (authenticated using a > secret token in the job). Look in the MapOutputServlet class in TaskTracker > for most of the related code. > > In Yarn: similar thing, except that now it's a NodeManager plug-in > (auxiliary service) that serves the map output since there's no TaskTracker > anymore. Look at the ShuffleHandler class in > hadoop-mapreduce-client-shuffle project. I see comments in the code > indicating that this will be changed from a NodeManager plug-in in the > future, but I don't know much about that. > > Hope it helps, > Mostafa > > > On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <[EMAIL PROTECTED]> wrote: > >> Hi folks, >> >> Can anyone explain to me briefly how the each mapper reports the >> location of the intermediate kv partion files to the master? And, if >> possible, where in the code I might find where that happens? >> >> Thanks for any help, >> Randy >> |