When running a job with more reducers than containers available in the
cluster all reducers get scheduled, leaving no containers available
for the mappers to be scheduled. The result is starvation and the job
never finishes. Is this to be considered a bug or is it expected
behavior? The workaround is to limit the number of reducers to less
than the number of containers available.
Also, it seems that from the combined pool of pending map and reduce
tasks, randomly tasks are picked and scheduled. This causes less than
optimal behavior. For example, I run a task with 500 mappers and 30
reducers (my cluster has only 16 machines, two containters per machine
(duo core machines)). What I observe is that half way through the job
all reduce tasks are scheduled, leaving only one container for 200+
map tasks. Again, is this expected behavior? If so, what is the idea
behind it? And, are the map and reduce task indeed randomly scheduled
or does it only look like they are?
Any advice is welcome.