Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> what is mapred.reduce.parallel.copies?


Copy link to this message
-
Re: what is mapred.reduce.parallel.copies?
Which hadoop version are you using ?
If it is 0.20.2, mapred.reduce.parallel.copies is the number of copying
threads in ReduceTask

In the scenario you described, at least 2 concurrent connections to a single
node would be made.

I am not familiar with newer versions of hadoop.

On Tue, Jun 28, 2011 at 11:31 AM, Virajith Jalaparti
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> I have a question about the "mapred.reduce.parallel.copies" configuration
> parameter in Hadoop. The mapred-default.xml file says it is "The default
> number of parallel transfers run by reduce
>   during the copy(shuffle) phase."
> Is this the number of slave nodes from which a reduce task reads in
> parallel? or is it the number of parallel intermediate outputs from map task
> which a reducer task can read from?
>
> For example, if I have 4 slave nodes and run a job with 800 maps and 4
> reducers with mapred.reduce.parallel.copies=5. Then can each reduce task
> read from all the 4 nodes in parallel i.e. it can makes only 4 concurrent
> connections to the 4 nodes present? or can it read from 5 of the 800 map
> outputs i.e. it makes at least 2 concurrent connections to a single node?
>
> In essence, I am trying to determine how many reducers would be accessing a
> single disk, concurrently, in any given Hadoop cluster for any job
> configuration as a function of the various parameters that can be specified
> in the configuration files.
>
> Thanks,
> Virajith
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB