George Porter 2009-08-17, 18:52
Naber, Chad 2009-08-17, 19:21
The configuration of the 8 node cluster is the same as the 4 and 16
with the following exceptions:
- 4 node cluster has 4 map and 4 reduce slots per node
- 8 node cluster has 2 map and 2 reduce slots per node
- 16 node cluster has 1 map and 1 reduce slot per node
- 4 node cluster has 4 local disks dedicated to storing blocks for the DataNode
- 8 node cluster has 2 local disks dedicated to storing blocks for the DataNode
- 16 node cluster has 1 local disk dedicated to storing blocks for the DataNode
In all cases, hadoop.tmp.dir is located on the first disk of each
node. Each JVM has 10GB of heap. The replication factor is set to 1.
I noticed that the 8 node cluster was failing every now and then, and
after looking at Ganglia graphs, it seems that when all the shuffle
phases begin (after the maps all complete) the network suffers
congestion and eventually some of the shuffles give up and fail.
I'll take a look at these io/shuffle configuration parameters and see
if they have an effect. Since the network seems to be the bottleneck,
I may try to play around with mapred.reduce.parallel.copies.
On Mon, Aug 17, 2009 at 12:21 PM, Naber, Chad<[EMAIL PROTECTED]> wrote:
> Hi George,
> This is interesting. Does the configuration of the 8 node cluster match the configuration of the 4 or 16 node cluster? There are some configs that can directly affect the shuffle phase of reduce. They would most likely be set in $HADOOPDIR/conf/hadoop-default.xml.
> The parameters to check out, in part, would be io.sort.factor, mapred.inmem.merge.threshold, mapred.job.shuffle.merge.percent, mapred.job.shuffle.input.buffer.percent, mapred.job.reduce.input.buffer.percent.
> Check out the "shuffle/reduce parameters section" on http://hadoop.apache.org/common/docs/current/mapred_tutorial.html if that does seem to be the problem.
> -----Original Message-----
> From: George Porter [mailto:[EMAIL PROTECTED]]
> Sent: Monday, August 17, 2009 11:53 AM
> To: [EMAIL PROTECTED]
> Subject: Shuffle phase not starting until 100% of maps are done?
> Hi everyone,
> I've come across a problem running Map/Reduce on an EC2 cluster, and
> was wondering if anyone here had any thoughts to what the issue was.
> I'm running a simple 'sort' M/R job on 40GB from the examples JAR on
> Hadoop 19.0 (using the Hadoop 19.0 AMI for Amazon EC2 on Extra-large
> images). When I run the sort job on a 4 or 16 node cluster, things
> work fine, and I notice that the shuffle phase begins when approx
> 45-50% of the maps are completed. However, when I run the sort job on
> an 8-node cluster, the shuffle doesn't begin until 100% of the maps
> are done. This causes the 8 node cluster to run much slower than I
> would have thought. There are over 2000 map tasks, and 16 map slots
> across those 8 nodes, and so a lot of map tasks have finished before
> the shuffle starts.
> Any thoughts on what would be delaying the start of the shuffle phase?