YARN schedulers handle this with the concept of "reservations". Scheduling
decisions occur on node heartbeats. When a node that is full heartbeats,
the next application that should be able to place a container on it gets to
place a "reservation" on it. Each node has space for a single reservation.
Containers for other applications will not be placed on the node until a
reservation is fulfilled.
If you are using the Fair Scheduler (Capacity Scheduler works similarly,
but I'm not sure on the specifics), this means that app B would get
containers far before app A completed, but not soon either. After app A
gets its 20 containers, it would get reservations as well on the nodes.
After one of app A's containers finishes on a node, it would get to place
another container on that node to fulfill its reservation. Then app B
would get a reservation on that node. Then no containers would be placed
on that node until app B is able to place one, which would be after both of
app A's containers finish.
It's also possible to configure the schedulers to use preemption to make
this kind of thing go a lot faster.
Does that make some sense?
On Mon, Sep 9, 2013 at 7:21 AM, John Lilley <[EMAIL PROTECTED]>wrote:
> Do the Hadoop 2.0 YARN scheduler(s) deal with situations like the
> Hadoop cluster of 10 nodes, with 8GB each available for containers. There
> is only one queue.****
> Application A requests 100 4GB containers. It initially, or after a
> little while, gets 20 containers.****
> Later, application B requests 1 8GB container.****
> Suppose that App-A’s containers each take a few minutes. At some point
> one will complete. When that happens, will the scheduler immediately
> allocate another 4GB container to App-A? If so will App-B ever get its
> container until App-A is almost done?****
> ** **