Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: How can I limit reducers to one-per-node?


Copy link to this message
-
Re: How can I limit reducers to one-per-node?
Adding a combiner step first then reduce?
On Feb 8, 2013, at 11:18 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hey David,
>
> There's no readily available way to do this today (you may be
> interested in MAPREDUCE-199 though) but if your Job scheduler's not
> doing multiple-assignments on reduce tasks, then only one is assigned
> per TT heartbeat, which gives you almost what you're looking for: 1
> reduce task per node, round-robin'd (roughly).
>
> On Sat, Feb 9, 2013 at 9:24 AM, David Parks <[EMAIL PROTECTED]> wrote:
>> I have a cluster of boxes with 3 reducers per node. I want to limit a
>> particular job to only run 1 reducer per node.
>>
>>
>>
>> This job is network IO bound, gathering images from a set of webservers.
>>
>>
>>
>> My job has certain parameters set to meet “web politeness” standards (e.g.
>> limit connects and connection frequency).
>>
>>
>>
>> If this job runs from multiple reducers on the same node, those per-host
>> limits will be violated.  Also, this is a shared environment and I don’t
>> want long running network bound jobs uselessly taking up all reduce slots.
>
>
>
> --
> Harsh J
>

Michael Segel  | (m) 312.755.9623

Segel and Associates
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB