On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee
<[EMAIL PROTECTED]> wrote:
> I was going through the job schedulers of Hadoop and could not see any major
> operational difference between the capacity scheduler and the fair share
> scheduler apart from the fact that fair share scheduler supports preemption
> and capacity scheduler doesn't.
I'd suggest reading design of both schedulers. The preemption feature
is not the only difference - there is also differences in how the
queues behave and how the tasks from various lined jobs are picked for
scheduling (i.e. the base algorithm).
> Another thing is the former creates logical pools based on certain attribute
> like username , user group etc and the later has a notion of job queues. Can
> someone point me to any other major differences between these two types of
Note that FairScheduler can also reuse the queues concept if you point
the pool name property at the queue name property config.
> Another question in this regard is the capacity scheduler uses a FIFO
> queue.So its still possible that a high priority long running job using all
> the capacity allocated to the queue to block all the other jobs after it in
> the queue.I think this is the expected behavior , but wanted to confirm.
I think this is the case, yes, if all the capacity has been soaked up
currently. However, the CS doesn't wait on job completions to schedule
next jobs if slots are free (like say, in the last wave).