Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Regarding FIFO scheduler


Copy link to this message
-
Re: Regarding FIFO scheduler
Thanks, got the point. So, the shuffle and sort can happen in parallel even
before all the map tasks are completed, but the reduce happens only after
all the map tasks are complete.

Praveen

On Thu, Sep 22, 2011 at 7:13 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:

> In most cases, your job will have more map tasks than map slots. You
> want the reducers to spin up at some point before all your maps
> complete, so that the shuffle and sort can work in parallel with some
> of your map tasks. I usually set slow start to 80%, sometimes higher
> if I know the maps are slow and they do a lot of filtering, so there
> isn't too much intermediate data.
>
> -Joey
>
> On Thu, Sep 22, 2011 at 6:38 AM, Praveen Sripati
> <[EMAIL PROTECTED]> wrote:
> > Joey,
> >
> > Thanks for the response.
> >
> > 'mapreduce.job.reduce.slowstart.completedmaps' is default set to 0.05 and
> > says 'Fraction of the number of maps in the job which should be complete
> > before reduces are scheduled for the job.'
> >
> > Shouldn't the map tasks be completed before the reduce tasks are kicked
> for
> > a particular job?
> >
> > Praveen
> >
> > On Thu, Sep 22, 2011 at 6:53 PM, Joey Echeverria <[EMAIL PROTECTED]>
> wrote:
> >>
> >> The jobs would run in parallel since J1 doesn't use all of your map
> >> tasks. Things get more interesting with reduce slots. If J1 is an
> >> overall slower job, and you haven't configured
> >> mapred.reduce.slowstart.completed.maps, then J1 could launch a bunch
> >> of idle reduce tasks which would starve J2.
> >>
> >> In general, it's best to configure the slow start property and to use
> >> the fair scheduler or capacity scheduler.
> >>
> >> -Joey
> >>
> >> On Thu, Sep 22, 2011 at 6:05 AM, Praveen Sripati
> >> <[EMAIL PROTECTED]> wrote:
> >> > Hi,
> >> >
> >> > Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map
> >> > tasks) and the cluster has a capacity of 150 map tasks (15 nodes with
> 10
> >> > map
> >> > tasks per node) and Hadoop is using the default FIFO scheduler. If I
> >> > submit
> >> > first J1 and then J2, will the jobs run in parallel or the job J1 has
> to
> >> > be
> >> > completed before the job J2 starts.
> >> >
> >> > I was reading 'Hadoop - The Definitive Guide'  and it says "Early
> >> > versions
> >> > of Hadoop had a very simple approach to scheduling users’ jobs: they
> ran
> >> > in
> >> > order of submission, using a FIFO scheduler. Typically, each job would
> >> > use
> >> > the whole cluster, so jobs had to wait their turn."
> >> >
> >> > Thanks,
> >> > Praveen
> >> >
> >>
> >>
> >>
> >> --
> >> Joseph Echeverria
> >> Cloudera, Inc.
> >> 443.305.9434
> >
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB