|
|
-
Re: What does mapred.map.tasksperslot do?
Hemanth Yamijala 2012-12-27, 08:43
David,
Could you please tell what version of Hadoop you are using ? I don't see this parameter in the stable (1.x) or current branch. I only see references to it with respect to EMR and with Hadoop 0.18 or so. On Thu, Dec 27, 2012 at 1:51 PM, David Parks <[EMAIL PROTECTED]> wrote:
> I didn’t come up with much in a google search.**** > > ** ** > > In particular, what are the side effects of changing this setting? Memory? > Sort process?**** > > ** ** > > I’m guessing it means that it’ll feed 2 map tasks as input to each map > task, a map task in turn is a self-contained JVM which consumes one map > slot.**** > > ** ** > > Thus 4 map slots, and 2 tasksperslot means 4 map task JVMs each of which > process 2 input splits at a time.**** > > ** ** > > By increasing the tasksperslot I presume we reduce overhead needed to > start a new task (even though we’re re-using the JVM in typical > configuration, ours included), but we have more map output to sort and > shuffle (I presume the results of both map splits go into the same output). > **** > > ** ** > > Can someone verify those presumptions?**** >
-
RE: What does mapred.map.tasksperslot do?
David Parks 2012-12-27, 09:42
Ah this is on AWS EMR, hadoop 1.0.3. This could be an AWS feature based on my reading of the AWS docs, but I thought it was hadoop.
From: Hemanth Yamijala [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 27, 2012 3:43 PM To: [EMAIL PROTECTED] Subject: Re: What does mapred.map.tasksperslot do?
David,
Could you please tell what version of Hadoop you are using ? I don't see this parameter in the stable (1.x) or current branch. I only see references to it with respect to EMR and with Hadoop 0.18 or so.
On Thu, Dec 27, 2012 at 1:51 PM, David Parks <[EMAIL PROTECTED]> wrote:
I didn't come up with much in a google search.
In particular, what are the side effects of changing this setting? Memory? Sort process?
I'm guessing it means that it'll feed 2 map tasks as input to each map task, a map task in turn is a self-contained JVM which consumes one map slot.
Thus 4 map slots, and 2 tasksperslot means 4 map task JVMs each of which process 2 input splits at a time.
By increasing the tasksperslot I presume we reduce overhead needed to start a new task (even though we're re-using the JVM in typical configuration, ours included), but we have more map output to sort and shuffle (I presume the results of both map splits go into the same output).
Can someone verify those presumptions?
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext