Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Suggestions/Info required regarding Hadoop Benchmarking

Copy link to this message
Suggestions/Info required regarding Hadoop Benchmarking
Gaurav Dasgupta 2012-08-28, 07:01
Hi Users,

I have a 12 node CDH3 cluster where I am planning to run some benchmark
tests. My main intension is to run the benchmarks first with the default
Hadoop configuration and then analyze the outcomes and tune the Hadoop
metrics accordingly to increase the performance of my cluster.

Can some one provide me some suggestions that which are the important
Hadoop metrics that I should observe during benchmarking?
Also, I have seen somewhere that the ratio of "Avg Map Tasks" and "Avg
Reduce Tasks" Execution Time is recorded for various benchmarks. How
significant is that information for me to judge the cluster performance?
How the ratios will help me to analyze and tune the Hadoop cluster
accordingly for increase in performance.

Till now I have run the following benchmarks without tuning the cluster
(with default Hadoop configuration):

   - Sort
   - WordCount
   - TeraSort
   - TestDFSIO

Please provide suggestion that which are the other benchmarks that I should
run, especially from "hadoop-test.jar" in $HADOOP_HOME directory and what
are the usage of those jobs.

Gaurav Dasgupta