Hemanth Yamijala 2012-12-27, 08:43
Ah this is on AWS EMR, hadoop 1.0.3. This could be an AWS feature based on
my reading of the AWS docs, but I thought it was hadoop.
From: Hemanth Yamijala [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 27, 2012 3:43 PM
To: [EMAIL PROTECTED]
Subject: Re: What does mapred.map.tasksperslot do?
Could you please tell what version of Hadoop you are using ? I don't see
this parameter in the stable (1.x) or current branch. I only see references
to it with respect to EMR and with Hadoop 0.18 or so.
On Thu, Dec 27, 2012 at 1:51 PM, David Parks <[EMAIL PROTECTED]> wrote:
I didn't come up with much in a google search.
In particular, what are the side effects of changing this setting? Memory?
I'm guessing it means that it'll feed 2 map tasks as input to each map task,
a map task in turn is a self-contained JVM which consumes one map slot.
Thus 4 map slots, and 2 tasksperslot means 4 map task JVMs each of which
process 2 input splits at a time.
By increasing the tasksperslot I presume we reduce overhead needed to start
a new task (even though we're re-using the JVM in typical configuration,
ours included), but we have more map output to sort and shuffle (I presume
the results of both map splits go into the same output).
Can someone verify those presumptions?