Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop 0.20.203 vertical scalability


Copy link to this message
-
Hadoop 0.20.203 vertical scalability
Hi

I am using Hadoop 0.20.203.

I have performed simple vertical scalability experiments of Hadoop with the
use of Graph datasets and BFS algorithm. My experiment configuration is
20workers + Master. In each test I divided the Map slots and Reduce Slots
equally (M==R), I can process the Job(single BFS step) in a single wave.
For the experiments I have used the wall clock time metric. The total size
of the dataset is around 311mb. Below are my recorded times (I'll ask the
question after presenting them).

Per worker : 1CPU (1Slot)  2CPU(2Slots) 3CPU  4CPU 5CPU  6CPU 7CPU  // 1
cpu is always left for deamons
Time          : 429s              434s              417s   412s   429s
  430s   470s   // presented times are averages from 10 algorithm executions

As it can be seen initial trend (1->2 cpu) is actually increasing the
execution time. However another trend occurs for the (2->3 cpu), which
lasts till 4cpu (time decrease). After that the execution time increase all
the way till 7cpus. Can some one explain to me why Hadoop scales like that,
I can understand that for the large number of CPUs, tasks compete over BUS,
local disk I/O. But I can not explain (do not understand) the change in
trends.

Thanks in advance.
regards
blah
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB