Thanks for the response.
'mapreduce.job.reduce.slowstart.completedmaps' is default set to 0.05 and
says 'Fraction of the number of maps in the job which should be complete
before reduces are scheduled for the job.'
Shouldn't the map tasks be completed before the reduce tasks are kicked for
a particular job?
On Thu, Sep 22, 2011 at 6:53 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> The jobs would run in parallel since J1 doesn't use all of your map
> tasks. Things get more interesting with reduce slots. If J1 is an
> overall slower job, and you haven't configured
> mapred.reduce.slowstart.completed.maps, then J1 could launch a bunch
> of idle reduce tasks which would starve J2.
> In general, it's best to configure the slow start property and to use
> the fair scheduler or capacity scheduler.
> On Thu, Sep 22, 2011 at 6:05 AM, Praveen Sripati
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> > Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map
> > tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10
> > tasks per node) and Hadoop is using the default FIFO scheduler. If I
> > first J1 and then J2, will the jobs run in parallel or the job J1 has to
> > completed before the job J2 starts.
> > I was reading 'Hadoop - The Definitive Guide' and it says "Early
> > of Hadoop had a very simple approach to scheduling users’ jobs: they ran
> > order of submission, using a FIFO scheduler. Typically, each job would
> > the whole cluster, so jobs had to wait their turn."
> > Thanks,
> > Praveen
> Joseph Echeverria
> Cloudera, Inc.