Trevor Robinson 2012-05-29, 21:33
Arun C Murthy 2012-05-29, 23:21
Trevor Robinson 2012-05-30, 20:38
Arun C Murthy 2012-06-05, 12:35
Trevor 2012-06-29, 22:48
Robert Evans 2012-05-31, 14:22
-RE: Idle nodes with terasort and MRv2/YARN (0.23.1)
Jeffrey Buell 2012-05-29, 22:10
I ran into the same issue. In the end I gave up and went back to 0.20 where I can specify the number of mappers and reducers per node (6 and 4 in your case). You can try increasing the memory.mb parameters which should force fewer map/reduce tasks per node, but then you won't be able to run your desired number of both kinds of tasks at the same time. If you find a solution please let the list know!
> -----Original Message-----
> From: Trevor Robinson [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, May 29, 2012 2:34 PM
> To: [EMAIL PROTECTED]
> Subject: Idle nodes with terasort and MRv2/YARN (0.23.1)
> I'm trying to tune terasort on a small cluster (4 identical slave
> nodes w/ 4 disks and 16GB RAM each), but I'm having problems with very
> uneven load.
> For teragen, I specify 24 mappers, but for some reason, only 2 nodes
> out of 4 run them all, even though the web UI (for both YARN and HDFS)
> shows all 4 nodes available. Similarly, I specify 16 reducers for
> terasort, but the reducers seem to run on 3 nodes out of 4. Do I have
> something configured wrong, or does the scheduler not attempt to
> spread out the load? In addition to performing sub-optimally, this
> also causes me to run out of disk space for large jobs, since the data
> is not being spread out evenly.
> Currently, I'm using these settings (not shown as XML for brevity):
> In case it's significant, I've scripted the cluster setup and terasort
> jobs, so everything runs back-to-back instantly, except that I poll to
> ensure that HDFS is up and has active data nodes before running
> teragen. I've also tried adding delays, but they didn't seem to have
> any effect, so I don't *think* it's a start-up race issue.
> Thanks for any advice,