Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> RE: MRBench Maps strange behaviour


+
Leo Leung 2012-08-29, 17:11
Copy link to this message
-
MRBench Maps strange behaviour
Hi All,

I executed the "MRBench" program from "hadoop-test.jar" in my 12 node CDH3
cluster. After executing, I had some strange observations regarding the
number of Maps it ran.

First I ran the command:
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
200 -reduces 200 -inputLines 1024 -inputType random
And I could see that the actual number of Maps it ran was 201 (for all the
3 runs) instead of 200 (Though the end report displays the launched to be
200). Here is the console report:
12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035

12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28

12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters

12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200

12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209

12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0

12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0

12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137

*12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201*

12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64

12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882

Again, I ran the MRBench for just 10 Maps and 10 Reduces:

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10

This time the actual number of Maps were only 2 and again the end report
displays Maps Lauched to be 10. The console output:

12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
*12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
*12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
(bytes)=6218842112
12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=3348828160
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=22955810816
12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
*DataLines Maps Reduces AvgTime (milliseconds)
1                20     20           17451
*

Can some one please help me understand this behaviour of Hadoop in this
case. My main purpose of running a MRBench is to calculate the Average time
for certain amount of Maps, Reduces, InputLines etc. If the number of Maps
is not what I submitted, then how can I judge my benchmark results?

Thanks,

Gaurav Dasgupta
+
Hemanth Yamijala 2012-08-29, 05:56
+
Gaurav Dasgupta 2012-08-29, 07:44
+
Hemanth Yamijala 2012-08-29, 08:31
+
Bejoy KS 2012-08-29, 07:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB