|
|
-
Question about intermediate kv pair files
rshepherd 2012-12-03, 18:08
Hi folks,
Can anyone explain to me briefly how the each mapper reports the location of the intermediate kv partion files to the master? And, if possible, where in the code I might find where that happens?
Thanks for any help, Randy
+
rshepherd 2012-12-03, 18:08
-
Re: Question about intermediate kv pair files
Mostafa Elhemali 2012-12-03, 18:26
(Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully the list will correct any details I get wrong)
In Hadoop 1: the mapper would put the file in a well-known location on the machine (encoded by user, job ID and map ID) then TaskTracker would serve it over HTTP to the reducer when it requests it (authenticated using a secret token in the job). Look in the MapOutputServlet class in TaskTracker for most of the related code.
In Yarn: similar thing, except that now it's a NodeManager plug-in (auxiliary service) that serves the map output since there's no TaskTracker anymore. Look at the ShuffleHandler class in hadoop-mapreduce-client-shuffle project. I see comments in the code indicating that this will be changed from a NodeManager plug-in in the future, but I don't know much about that.
Hope it helps, Mostafa On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <[EMAIL PROTECTED]> wrote:
> Hi folks, > > Can anyone explain to me briefly how the each mapper reports the > location of the intermediate kv partion files to the master? And, if > possible, where in the code I might find where that happens? > > Thanks for any help, > Randy >
+
Mostafa Elhemali 2012-12-03, 18:26
-
Re: Question about intermediate kv pair files
rshepherd 2012-12-03, 18:28
Thanks Mostafa! Very much appreciated.
On 12/3/12 1:26 PM, Mostafa Elhemali wrote: > (Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully > the list will correct any details I get wrong) > > In Hadoop 1: the mapper would put the file in a well-known location on the > machine (encoded by user, job ID and map ID) then TaskTracker would serve > it over HTTP to the reducer when it requests it (authenticated using a > secret token in the job). Look in the MapOutputServlet class in TaskTracker > for most of the related code. > > In Yarn: similar thing, except that now it's a NodeManager plug-in > (auxiliary service) that serves the map output since there's no TaskTracker > anymore. Look at the ShuffleHandler class in > hadoop-mapreduce-client-shuffle project. I see comments in the code > indicating that this will be changed from a NodeManager plug-in in the > future, but I don't know much about that. > > Hope it helps, > Mostafa > > > On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <[EMAIL PROTECTED]> wrote: > >> Hi folks, >> >> Can anyone explain to me briefly how the each mapper reports the >> location of the intermediate kv partion files to the master? And, if >> possible, where in the code I might find where that happens? >> >> Thanks for any help, >> Randy >>
+
rshepherd 2012-12-03, 18:28
|
|