Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Shuffle phase not starting until 100% of maps are done?


Copy link to this message
-
Shuffle phase not starting until 100% of maps are done?
Hi everyone,

I've come across a problem running Map/Reduce on an EC2 cluster, and
was wondering if anyone here had any thoughts to what the issue was.

I'm running a simple 'sort' M/R job on 40GB from the examples JAR on
Hadoop 19.0 (using the Hadoop 19.0 AMI for Amazon EC2 on Extra-large
images).  When I run the sort job on a 4 or 16 node cluster, things
work fine, and I notice that the shuffle phase begins when approx
45-50% of the maps are completed.  However, when I run the sort job on
an 8-node cluster, the shuffle doesn't begin until 100% of the maps
are done.  This causes the 8 node cluster to run much slower than I
would have thought.  There are over 2000 map tasks, and 16 map slots
across those 8 nodes, and so a lot of map tasks have finished before
the shuffle starts.

Any thoughts on what would be delaying the start of the shuffle phase?

Thanks,
George
+
Naber, Chad 2009-08-17, 19:21
+
George Porter 2009-08-17, 19:43
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB