The one advantage you would get with a large number of reducers is
that the scheduler will be able to give open reduce slots to other
jobs without having to be preemptive.
This will reduce the risk of you losing a reducer 3 hours into a 4 hour run.
On Wed, May 18, 2011 at 3:08 PM, James Seigel <[EMAIL PROTECTED]> wrote:
> Upping the reduce.tasks to a huge number just means that it will eventually spawn reducers = to (that huge number). You still only have slots for 360 so there is no real advantage, UNLESS you are running into OOM errors, which we’ve seen with higher re-use on the smaller number of reducers.
> Anyhoo, someone else can chime in and correct me if I am off base.
> Does that make sense?
> On 2011-05-18, at 4:04 PM, W.P. McNeill wrote:
>> I'm using fair scheduler and JVM reuse. It is just plain a big job.
>> I'm not using a combiner right now, but that's something to look at.
>> What about bumping the mapred.reduce.tasks up to some huge number? I think
>> that shouldn't make a difference, but I'm hearing conflicting information on