Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Question about intermediate kv pair files


Copy link to this message
-
Re: Question about intermediate kv pair files
(Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully
the list will correct any details I get wrong)

In Hadoop 1: the mapper would put the file in a well-known location on the
machine (encoded by user, job ID and map ID) then TaskTracker would serve
it over HTTP to the reducer when it requests it (authenticated using a
secret token in the job). Look in the MapOutputServlet class in TaskTracker
for most of the related code.

In Yarn: similar thing, except that now it's a NodeManager plug-in
(auxiliary service) that serves the map output since there's no TaskTracker
anymore. Look at the ShuffleHandler class in
hadoop-mapreduce-client-shuffle project. I see comments in the code
indicating that this will be changed from a NodeManager plug-in in the
future, but I don't know much about that.

Hope it helps,
Mostafa
On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <[EMAIL PROTECTED]> wrote:

> Hi folks,
>
> Can anyone explain to me briefly how the each mapper reports the
> location of the intermediate kv partion files to the master? And, if
> possible, where in the code I might find where that happens?
>
> Thanks for any help,
> Randy
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB