Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Idle nodes with terasort and MRv2/YARN (0.23.1)


Copy link to this message
-
RE: Idle nodes with terasort and MRv2/YARN (0.23.1)
I ran into the same issue.  In the end I gave up and went back to 0.20 where I can specify the number of mappers and reducers per node (6 and 4 in your case).  You can try increasing the memory.mb parameters which should force fewer map/reduce tasks per node, but then you won't be able to run your desired number of both kinds of tasks at the same time.  If you find a solution please let the list know!

Jeff

> -----Original Message-----
> From: Trevor Robinson [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, May 29, 2012 2:34 PM
> To: [EMAIL PROTECTED]
> Subject: Idle nodes with terasort and MRv2/YARN (0.23.1)
>
> Hello,
>
> I'm trying to tune terasort on a small cluster (4 identical slave
> nodes w/ 4 disks and 16GB RAM each), but I'm having problems with very
> uneven load.
>
> For teragen, I specify 24 mappers, but for some reason, only 2 nodes
> out of 4 run them all, even though the web UI (for both YARN and HDFS)
> shows all 4 nodes available. Similarly, I specify 16 reducers for
> terasort, but the reducers seem to run on 3 nodes out of 4. Do I have
> something configured wrong, or does the scheduler not attempt to
> spread out the load? In addition to performing sub-optimally, this
> also causes me to run out of disk space for large jobs, since the data
> is not being spread out evenly.
>
> Currently, I'm using these settings (not shown as XML for brevity):
>
> yarn-site.xml:
> yarn.nodemanager.resource.memory-mb=13824
>
> mapred-site.xml:
> mapreduce.map.memory.mb=768
> mapreduce.map.java.opts=-Xmx512M
> mapreduce.reduce.memory.mb=2304
> mapreduce.reduce.java.opts=-Xmx2048M
> mapreduce.task.io.sort.mb=512
>
> In case it's significant, I've scripted the cluster setup and terasort
> jobs, so everything runs back-to-back instantly, except that I poll to
> ensure that HDFS is up and has active data nodes before running
> teragen. I've also tried adding delays, but they didn't seem to have
> any effect, so I don't *think* it's a start-up race issue.
>
> Thanks for any advice,
> Trevor
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB