Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - RE: Shuffle phase


Copy link to this message
-
RE: Shuffle phase
John Lilley 2013-05-22, 14:58
I was reading the elephant book trying to understand which process actually serves up the HTTP transfer on the mapper side.  Is it the each map task?  Or is there some persistent task on each worker that serves up mapper output for all map tasks?
Thanks,
John

From: Kai Voigt [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, May 21, 2013 12:59 PM
To: [EMAIL PROTECTED]
Subject: Re: Shuffle phase replication factor

The map output doesn't get written to HDFS. The map task writes its output to its local disk, the reduce tasks will pull the data through HTTP for further processing.

Am 21.05.2013 um 19:57 schrieb John Lilley <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>:
When MapReduce enters "shuffle" to partition the tuples, I am assuming that it writes intermediate data to HDFS.  What replication factor is used for those temporary files?
john
--
Kai Voigt
[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>