Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - what is mapred.reduce.parallel.copies?


Copy link to this message
-
Re: what is mapred.reduce.parallel.copies?
Ted Yu 2011-06-28, 20:59
Which hadoop version are you using ?
If it is 0.20.2, mapred.reduce.parallel.copies is the number of copying
threads in ReduceTask

In the scenario you described, at least 2 concurrent connections to a single
node would be made.

I am not familiar with newer versions of hadoop.

On Tue, Jun 28, 2011 at 11:31 AM, Virajith Jalaparti
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> I have a question about the "mapred.reduce.parallel.copies" configuration
> parameter in Hadoop. The mapred-default.xml file says it is "The default
> number of parallel transfers run by reduce
>   during the copy(shuffle) phase."
> Is this the number of slave nodes from which a reduce task reads in
> parallel? or is it the number of parallel intermediate outputs from map task
> which a reducer task can read from?
>
> For example, if I have 4 slave nodes and run a job with 800 maps and 4
> reducers with mapred.reduce.parallel.copies=5. Then can each reduce task
> read from all the 4 nodes in parallel i.e. it can makes only 4 concurrent
> connections to the 4 nodes present? or can it read from 5 of the 800 map
> outputs i.e. it makes at least 2 concurrent connections to a single node?
>
> In essence, I am trying to determine how many reducers would be accessing a
> single disk, concurrently, in any given Hadoop cluster for any job
> configuration as a function of the various parameters that can be specified
> in the configuration files.
>
> Thanks,
> Virajith
>