I am using Hadoop 0.20.203.
I have performed simple vertical scalability experiments of Hadoop with the
use of Graph datasets and BFS algorithm. My experiment configuration is
20workers + Master. In each test I divided the Map slots and Reduce Slots
equally (M==R), I can process the Job(single BFS step) in a single wave.
For the experiments I have used the wall clock time metric. The total size
of the dataset is around 311mb. Below are my recorded times (I'll ask the
question after presenting them).
Per worker : 1CPU (1Slot) 2CPU(2Slots) 3CPU 4CPU 5CPU 6CPU 7CPU // 1
cpu is always left for deamons
Time : 429s 434s 417s 412s 429s
430s 470s // presented times are averages from 10 algorithm executions
As it can be seen initial trend (1->2 cpu) is actually increasing the
execution time. However another trend occurs for the (2->3 cpu), which
lasts till 4cpu (time decrease). After that the execution time increase all
the way till 7cpus. Can some one explain to me why Hadoop scales like that,
I can understand that for the large number of CPUs, tasks compete over BUS,
local disk I/O. But I can not explain (do not understand) the change in
Thanks in advance.