Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Reducer granularity and starvation


Copy link to this message
-
Re: Reducer granularity and starvation
The one advantage you would get with a large number of reducers is
that the scheduler will be able to give open reduce slots to other
jobs without having to be preemptive.

This will reduce the risk of you losing a reducer 3 hours into a 4 hour run.

-Joey

On Wed, May 18, 2011 at 3:08 PM, James Seigel <[EMAIL PROTECTED]> wrote:
> W.P,
>
> Upping the reduce.tasks to a huge number just means that it will eventually spawn reducers = to (that huge number).  You still only have slots for 360 so there is no real advantage, UNLESS you are running into OOM errors, which we’ve seen with higher re-use on the smaller number of reducers.
>
> Anyhoo, someone else can chime in and correct me if I am off base.
>
> Does that make sense?
>
> Cheers
> James.
> On 2011-05-18, at 4:04 PM, W.P. McNeill wrote:
>
>> I'm using fair scheduler and JVM reuse.  It is just plain a big job.
>>
>> I'm not using a combiner right now, but that's something to look at.
>>
>> What about bumping the mapred.reduce.tasks up to some huge number?  I think
>> that shouldn't make a difference, but I'm hearing conflicting information on
>> this.
>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB