Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Question about intermediate kv pair files


Copy link to this message
-
Re: Question about intermediate kv pair files
Thanks Mostafa! Very much appreciated.

On 12/3/12 1:26 PM, Mostafa Elhemali wrote:
> (Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully
> the list will correct any details I get wrong)
>
> In Hadoop 1: the mapper would put the file in a well-known location on the
> machine (encoded by user, job ID and map ID) then TaskTracker would serve
> it over HTTP to the reducer when it requests it (authenticated using a
> secret token in the job). Look in the MapOutputServlet class in TaskTracker
> for most of the related code.
>
> In Yarn: similar thing, except that now it's a NodeManager plug-in
> (auxiliary service) that serves the map output since there's no TaskTracker
> anymore. Look at the ShuffleHandler class in
> hadoop-mapreduce-client-shuffle project. I see comments in the code
> indicating that this will be changed from a NodeManager plug-in in the
> future, but I don't know much about that.
>
> Hope it helps,
> Mostafa
>
>
> On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <[EMAIL PROTECTED]> wrote:
>
>> Hi folks,
>>
>> Can anyone explain to me briefly how the each mapper reports the
>> location of the intermediate kv partion files to the master? And, if
>> possible, where in the code I might find where that happens?
>>
>> Thanks for any help,
>> Randy
>>