On Thu, Aug 12, 2010 at 10:31 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote:
> On Thu, Aug 12, 2010 at 3:35 AM, Bobby Dennett
> <bdennett+[EMAIL PROTECTED]> wrote:
>> From what I've read/seen, it appears that, if not the "default"
>> scheduler, most installations are using Hadoop's Fair Scheduler. Based
>> on features and our requirements, we're leaning towards using the
>> Capacity Scheduler; however, there is some concern that it may not be
>> as "stable" as there doesn't appear to be as much talk about it,
>> compared to the Fair Scheduler.
>> Has anyone hit any nasty issues with regards to the Capacity Scheduler
>> and, in general, are there any "gotchas" to look out for with either
>> We're ramping up the number of users on our Hadoop clusters,
>> particularly in regards to Hive. Our goal is to ensure that production
>> processes continue to run with a majority of the cluster during peak
>> usage times, while personal users share the remaining capacity. The
>> Capacity Scheduler's support of queues and for memory-intensive jobs
>> is appealing but we are curious about drawbacks and/or potential
> FWIW, Yahoo! is running capacity scheduler for a reasonably long time
> now. However, there have been many patches on top of the base Hadoop
> 0.20.2 version to capacity scheduler that make it 'stable' and work at
> large scale effectively. Looking at the change log of the yahoo hadoop
> distribution could possibly give an idea of which patches are useful
> to pick up and apply to an older version. The good news is that most
> of these patches have 0.20 versions that are available on JIRA and
> would apply reasonably cleanly.
Allen cautions the part about patches applying cleanly to 0.20 might
not be very true. Thanks for that heads-up, Allen !