|
|
-
How can I limit reducers to one-per-node?
David Parks 2013-02-09, 03:54
I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node.
This job is network IO bound, gathering images from a set of webservers.
My job has certain parameters set to meet "web politeness" standards (e.g. limit connects and connection frequency).
If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don't want long running network bound jobs uselessly taking up all reduce slots.
-
Re: How can I limit reducers to one-per-node?
Nan Zhu 2013-02-09, 03:59
I think set tasktracker.reduce.tasks.maximum to be 1 may meet your requirement Best,
-- Nan Zhu School of Computer Science, McGill University
On Friday, 8 February, 2013 at 10:54 PM, David Parks wrote:
> I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node. > > This job is network IO bound, gathering images from a set of webservers. > > My job has certain parameters set to meet “web politeness” standards (e.g. limit connects and connection frequency). > > If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don’t want long running network bound jobs uselessly taking up all reduce slots. > > >
-
RE: How can I limit reducers to one-per-node?
David Parks 2013-02-09, 04:24
Hmm, odd, I’m using AWS Mapreduce, and this property is already set to 1 on my cluster by default (using 15 m1.xlarge boxes which come with 3 reducer slots configured by default).
From: Nan Zhu [mailto:[EMAIL PROTECTED]] Sent: Saturday, February 09, 2013 10:59 AM To: [EMAIL PROTECTED] Subject: Re: How can I limit reducers to one-per-node?
I think set tasktracker.reduce.tasks.maximum to be 1 may meet your requirement
Best,
--
Nan Zhu
School of Computer Science,
McGill University
On Friday, 8 February, 2013 at 10:54 PM, David Parks wrote:
I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node.
This job is network IO bound, gathering images from a set of webservers.
My job has certain parameters set to meet “web politeness” standards (e.g. limit connects and connection frequency).
If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don’t want long running network bound jobs uselessly taking up all reduce slots.
-
Re: How can I limit reducers to one-per-node?
Nan Zhu 2013-02-09, 04:30
I haven't use AWS MR before…..if your instances are configured with 3 reducer slots, it means that 3 reducers can run at the same time in this node,
what do you mean by "this property is already set to 1 on my cluster"?
actually this value can be node-specific, if AWS MR instance allows you to do that, you can modify mapred-site.xml to change it from 3 to 1
Best,
-- Nan Zhu School of Computer Science, McGill University On Friday, 8 February, 2013 at 11:24 PM, David Parks wrote:
> Hmm, odd, I’m using AWS Mapreduce, and this property is already set to 1 on my cluster by default (using 15 m1.xlarge boxes which come with 3 reducer slots configured by default). > > > > From: Nan Zhu [mailto:[EMAIL PROTECTED]] > Sent: Saturday, February 09, 2013 10:59 AM > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) > Subject: Re: How can I limit reducers to one-per-node? > > I think set tasktracker.reduce.tasks.maximum to be 1 may meet your requirement > > > > > > Best, > > > > -- > > Nan Zhu > > School of Computer Science, > > McGill University > > > > > > > On Friday, 8 February, 2013 at 10:54 PM, David Parks wrote: > > > > I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node. > > > > > > > > > > > > This job is network IO bound, gathering images from a set of webservers. > > > > > > > > > > > > My job has certain parameters set to meet “web politeness” standards (e.g. limit connects and connection frequency). > > > > > > > > > > > > If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don’t want long running network bound jobs uselessly taking up all reduce slots. > > > > > > > > > > > > > >
-
RE: How can I limit reducers to one-per-node?
David Parks 2013-02-09, 04:46
Looking at the Job File for my job I see that this property is set to 1, however I have 3 reducers per node (I’m not clear what configuration is causing this behavior).
My problem is that, on a 15 node cluster, I set 15 reduce tasks on my job, in hopes that each would be assigned to a different node, but in the last run 3 nodes had nothing to do, and 3 other nodes had 2 reduce tasks assigned.
From: Nan Zhu [mailto:[EMAIL PROTECTED]] Sent: Saturday, February 09, 2013 11:31 AM To: [EMAIL PROTECTED] Subject: Re: How can I limit reducers to one-per-node?
I haven't use AWS MR before…..if your instances are configured with 3 reducer slots, it means that 3 reducers can run at the same time in this node,
what do you mean by "this property is already set to 1 on my cluster"?
actually this value can be node-specific, if AWS MR instance allows you to do that, you can modify mapred-site.xml to change it from 3 to 1
Best,
--
Nan Zhu
School of Computer Science,
McGill University
On Friday, 8 February, 2013 at 11:24 PM, David Parks wrote:
Hmm, odd, I’m using AWS Mapreduce, and this property is already set to 1 on my cluster by default (using 15 m1.xlarge boxes which come with 3 reducer slots configured by default).
From: Nan Zhu [mailto:[EMAIL PROTECTED]] Sent: Saturday, February 09, 2013 10:59 AM To: [EMAIL PROTECTED] Subject: Re: How can I limit reducers to one-per-node?
I think set tasktracker.reduce.tasks.maximum to be 1 may meet your requirement
Best,
--
Nan Zhu
School of Computer Science,
McGill University
On Friday, 8 February, 2013 at 10:54 PM, David Parks wrote:
I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node.
This job is network IO bound, gathering images from a set of webservers.
My job has certain parameters set to meet “web politeness” standards (e.g. limit connects and connection frequency).
If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don’t want long running network bound jobs uselessly taking up all reduce slots.
-
Re: How can I limit reducers to one-per-node?
Nan Zhu 2013-02-09, 04:59
those nodes with 2 reducers were running these two r at the same time? if yes, I think you can change mapred-site.xml as I suggested,
if no, i.e. your goal is to make all nodes take the same number of tasks in the life cycle of job….I don't know if there is any provided property can do this….
Best,
-- Nan Zhu School of Computer Science, McGill University
On Friday, 8 February, 2013 at 11:46 PM, David Parks wrote:
> Looking at the Job File for my job I see that this property is set to 1, however I have 3 reducers per node (I’m not clear what configuration is causing this behavior). > > My problem is that, on a 15 node cluster, I set 15 reduce tasks on my job, in hopes that each would be assigned to a different node, but in the last run 3 nodes had nothing to do, and 3 other nodes had 2 reduce tasks assigned. > > > > From: Nan Zhu [mailto:[EMAIL PROTECTED]] > Sent: Saturday, February 09, 2013 11:31 AM > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) > Subject: Re: How can I limit reducers to one-per-node? > > I haven't use AWS MR before…..if your instances are configured with 3 reducer slots, it means that 3 reducers can run at the same time in this node, > > > > what do you mean by "this property is already set to 1 on my cluster"? > > > > actually this value can be node-specific, if AWS MR instance allows you to do that, you can modify mapred-site.xml to change it from 3 to 1 > > > > Best, > > > > -- > > Nan Zhu > > School of Computer Science, > > McGill University > > > > > On Friday, 8 February, 2013 at 11:24 PM, David Parks wrote: > > > > Hmm, odd, I’m using AWS Mapreduce, and this property is already set to 1 on my cluster by default (using 15 m1.xlarge boxes which come with 3 reducer slots configured by default). > > > > > > > > > > > > > > > > > > > > > > > > From: Nan Zhu [mailto:[EMAIL PROTECTED]] > > Sent: Saturday, February 09, 2013 10:59 AM > > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) > > Subject: Re: How can I limit reducers to one-per-node? > > > > > > > > > > > > I think set tasktracker.reduce.tasks.maximum to be 1 may meet your requirement > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > -- > > > > > > > > Nan Zhu > > > > > > > > School of Computer Science, > > > > > > > > McGill University > > > > > > > > > > > > > > > > > > > > > > > > On Friday, 8 February, 2013 at 10:54 PM, David Parks wrote: > > > > > > I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node. > > > > > > > > > > > > > > > > > > This job is network IO bound, gathering images from a set of webservers. > > > > > > > > > > > > > > > > > > My job has certain parameters set to meet “web politeness” standards (e.g. limit connects and connection frequency). > > > > > > > > > > > > > > > > > > If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don’t want long running network bound jobs uselessly taking up all reduce slots. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
-
Re: How can I limit reducers to one-per-node?
Harsh J 2013-02-09, 05:18
Hey David,
There's no readily available way to do this today (you may be interested in MAPREDUCE-199 though) but if your Job scheduler's not doing multiple-assignments on reduce tasks, then only one is assigned per TT heartbeat, which gives you almost what you're looking for: 1 reduce task per node, round-robin'd (roughly).
On Sat, Feb 9, 2013 at 9:24 AM, David Parks <[EMAIL PROTECTED]> wrote: > I have a cluster of boxes with 3 reducers per node. I want to limit a > particular job to only run 1 reducer per node. > > > > This job is network IO bound, gathering images from a set of webservers. > > > > My job has certain parameters set to meet “web politeness” standards (e.g. > limit connects and connection frequency). > > > > If this job runs from multiple reducers on the same node, those per-host > limits will be violated. Also, this is a shared environment and I don’t > want long running network bound jobs uselessly taking up all reduce slots.
-- Harsh J
|
|