Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Shuffle phase not starting until 100% of maps are done?


Copy link to this message
-
Shuffle phase not starting until 100% of maps are done?
George Porter 2009-08-17, 18:52
Hi everyone,

I've come across a problem running Map/Reduce on an EC2 cluster, and
was wondering if anyone here had any thoughts to what the issue was.

I'm running a simple 'sort' M/R job on 40GB from the examples JAR on
Hadoop 19.0 (using the Hadoop 19.0 AMI for Amazon EC2 on Extra-large
images).  When I run the sort job on a 4 or 16 node cluster, things
work fine, and I notice that the shuffle phase begins when approx
45-50% of the maps are completed.  However, when I run the sort job on
an 8-node cluster, the shuffle doesn't begin until 100% of the maps
are done.  This causes the 8 node cluster to run much slower than I
would have thought.  There are over 2000 map tasks, and 16 map slots
across those 8 nodes, and so a lot of map tasks have finished before
the shuffle starts.

Any thoughts on what would be delaying the start of the shuffle phase?

Thanks,
George