What is the minimum container size? i.e. yarn.scheduler.minimum-allocation-mb.
I'd bump it up to at least 1G and use the CapacityScheduler for performance tests:
In case of teragen, the job has no locality at all (since it's just generating data from 'random' input-splits) and hence you are getting them stuck on fewer nodes since you have so many containers on each node.
The reduces should be better spread if you are using CapacityScheduler and have https://issues.apache.org/jira/browse/MAPREDUCE-3641 in your build i.e. hadoop-0.23.1 or hadoop-2.0.0-alpha (I'd use the latter).
Also, FYI, currently the CS makes the tradeoff that node-locality is almost same as rack-locality and hence you might see maps not spread out for terasort. I'll fix that one soon.
On May 29, 2012, at 2:33 PM, Trevor Robinson wrote:
> I'm trying to tune terasort on a small cluster (4 identical slave
> nodes w/ 4 disks and 16GB RAM each), but I'm having problems with very
> uneven load.
> For teragen, I specify 24 mappers, but for some reason, only 2 nodes
> out of 4 run them all, even though the web UI (for both YARN and HDFS)
> shows all 4 nodes available. Similarly, I specify 16 reducers for
> terasort, but the reducers seem to run on 3 nodes out of 4. Do I have
> something configured wrong, or does the scheduler not attempt to
> spread out the load? In addition to performing sub-optimally, this
> also causes me to run out of disk space for large jobs, since the data
> is not being spread out evenly.
> Currently, I'm using these settings (not shown as XML for brevity):
> In case it's significant, I've scripted the cluster setup and terasort
> jobs, so everything runs back-to-back instantly, except that I poll to
> ensure that HDFS is up and has active data nodes before running
> teragen. I've also tried adding delays, but they didn't seem to have
> any effect, so I don't *think* it's a start-up race issue.
> Thanks for any advice,
Arun C. Murthy