Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: How can I limit reducers to one-per-node?


Copy link to this message
-
Re: How can I limit reducers to one-per-node?
Harsh J 2013-02-11, 05:48
The suggestion to add a combiner is to help reduce the shuffle load
(and perhaps, reduce # of reducers needed?), but it doesn't affect
scheduling of a set number of reduce tasks nor does a scheduler care
currently if you add that step in or not.

On Mon, Feb 11, 2013 at 7:59 AM, David Parks <[EMAIL PROTECTED]> wrote:
> I guess the FairScheduler is doing multiple assignments per heartbeat, hence
> the behavior of multiple reduce tasks per node even when they should
> otherwise be full distributed.
>
>
>
> Adding a combiner will change this behavior? Could you explain more?
>
>
>
> Thanks!
>
> David
>
>
>
>
>
> From: Michael Segel [mailto:[EMAIL PROTECTED]]
> Sent: Monday, February 11, 2013 8:30 AM
>
>
> To: [EMAIL PROTECTED]
> Subject: Re: How can I limit reducers to one-per-node?
>
>
>
> Adding a combiner step first then reduce?
>
>
>
>
>
> On Feb 8, 2013, at 11:18 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>
>
> Hey David,
>
> There's no readily available way to do this today (you may be
> interested in MAPREDUCE-199 though) but if your Job scheduler's not
> doing multiple-assignments on reduce tasks, then only one is assigned
> per TT heartbeat, which gives you almost what you're looking for: 1
> reduce task per node, round-robin'd (roughly).
>
> On Sat, Feb 9, 2013 at 9:24 AM, David Parks <[EMAIL PROTECTED]> wrote:
>
> I have a cluster of boxes with 3 reducers per node. I want to limit a
> particular job to only run 1 reducer per node.
>
>
>
> This job is network IO bound, gathering images from a set of webservers.
>
>
>
> My job has certain parameters set to meet “web politeness” standards (e.g.
> limit connects and connection frequency).
>
>
>
> If this job runs from multiple reducers on the same node, those per-host
> limits will be violated.  Also, this is a shared environment and I don’t
> want long running network bound jobs uselessly taking up all reduce slots.
>
>
>
>
> --
> Harsh J
>
>
>
> Michael Segel  | (m) 312.755.9623
>
> Segel and Associates
>
>

--
Harsh J