Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Scheduler recommendation

Copy link to this message
Re: Scheduler recommendation

On Thu, Aug 12, 2010 at 3:35 AM, Bobby Dennett
<bdennett+[EMAIL PROTECTED]> wrote:
> From what I've read/seen, it appears that, if not the "default"
> scheduler, most installations are using Hadoop's Fair Scheduler. Based
> on features and our requirements, we're leaning towards using the
> Capacity Scheduler; however, there is some concern that it may not be
> as "stable" as there doesn't appear to be as much talk about it,
> compared to the Fair Scheduler.
> Has anyone hit any nasty issues with regards to the Capacity Scheduler
> and, in general, are there any "gotchas" to look out for with either
> scheduler?
> We're ramping up the number of users on our Hadoop clusters,
> particularly in regards to Hive. Our goal is to ensure that production
> processes continue to run with a majority of the cluster during peak
> usage times, while personal users share the remaining capacity. The
> Capacity Scheduler's support of queues and for memory-intensive jobs
> is appealing but we are curious about drawbacks and/or potential
> issues.

FWIW, Yahoo! is running capacity scheduler for a reasonably long time
now. However, there have been many patches on top of the base Hadoop
0.20.2 version to capacity scheduler that make it 'stable' and work at
large scale effectively. Looking at the change log of the yahoo hadoop
distribution could possibly give an idea of which patches are useful
to pick up and apply to an older version. The good news is that most
of these patches have 0.20 versions that are available on JIRA and
would apply reasonably cleanly.

> Thanks in advance,
> -Bobby