|
|
-
How does hadoop decide how many reducers to run?
Roy Smith 2013-01-11, 22:59
I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances. Each instance is 8 cores, so 32 cores total. Hadoop ran 16 reducers, followed by a second wave of 12. It seems to me it was only using half the available cores. Is this normal? Is there some way to force it to use all the cores?
--- Roy Smith [EMAIL PROTECTED]
+
Roy Smith 2013-01-11, 22:59
-
Re: How does hadoop decide how many reducers to run?
Michael Segel 2013-01-11, 23:20
Hi,
First, not enough information.
1) EC2 got it. 2) Which flavor of Hadoop? Is this EMR as well? 3) How many slots did you configure in your mapred-site.xml?
AWS EC2 cores aren't going to be hyperthreaded cores so 8 cores means that you will probably have 6 cores for slots. With 16 reducers it sounds like you have 4 mappers and 4 reducers or 8 slots set up. (Over subscription is ok if you're not running HBase)
So what are you missing? On Jan 11, 2013, at 4:59 PM, Roy Smith <[EMAIL PROTECTED]> wrote:
> I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances. Each instance is 8 cores, so 32 cores total. Hadoop ran 16 reducers, followed by a second wave of 12. It seems to me it was only using half the available cores. Is this normal? Is there some way to force it to use all the cores? > > --- > Roy Smith > [EMAIL PROTECTED] > > > >
+
Michael Segel 2013-01-11, 23:20
-
Re: How does hadoop decide how many reducers to run?
Roy Smith 2013-01-11, 23:53
On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:
> Hi, > > First, not enough information. > > 1) EC2 got it. > 2) Which flavor of Hadoop? Is this EMR as well?
Yes, EMR. We're running AMI version 2.3.1, which includes hadoop 1.0.3. > 3) How many slots did you configure in your mapred-site.xml?
Hmmm, no clue. I've never even heard of that file. We're using mrjob. It may be that mrjob is building a mapred-site.xml file for me and I never even see it?
--- Roy Smith [EMAIL PROTECTED]
+
Roy Smith 2013-01-11, 23:53
-
Re: How does hadoop decide how many reducers to run?
Michael Segel 2013-01-12, 14:05
Since you are using EMR, AWS pre configures the number of slots per node. So you are already getting the optimum number of slots that their 'machines' can handle.
So when you run your job, you said that you saw 16 reducers and then 12 reducers running.
This could imply that your job required 28 reducers and it was using the full resources of the machines.
On Jan 11, 2013, at 5:53 PM, Roy Smith <[EMAIL PROTECTED]> wrote:
> On Jan 11, 2013, at 6:20 PM, Michael Segel wrote: > >> Hi, >> >> First, not enough information. >> >> 1) EC2 got it. >> 2) Which flavor of Hadoop? Is this EMR as well? > > Yes, EMR. We're running AMI version 2.3.1, which includes hadoop 1.0.3. > > >> 3) How many slots did you configure in your mapred-site.xml? > > Hmmm, no clue. I've never even heard of that file. We're using mrjob. It may be that mrjob is building a mapred-site.xml file for me and I never even see it? > > --- > Roy Smith > [EMAIL PROTECTED] > > > >
+
Michael Segel 2013-01-12, 14:05
|
|