Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> RE: MRBench Maps strange behaviour


+
Leo Leung 2012-08-29, 17:11
Copy link to this message
-
MRBench Maps strange behaviour
Hi All,

I executed the "MRBench" program from "hadoop-test.jar" in my 12 node CDH3
cluster. After executing, I had some strange observations regarding the
number of Maps it ran.

First I ran the command:
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
200 -reduces 200 -inputLines 1024 -inputType random
And I could see that the actual number of Maps it ran was 201 (for all the
3 runs) instead of 200 (Though the end report displays the launched to be
200). Here is the console report:
12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035

12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28

12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters

12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200

12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209

12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0

12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0

12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137

*12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201*

12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64

12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882

Again, I ran the MRBench for just 10 Maps and 10 Reduces:

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10

This time the actual number of Maps were only 2 and again the end report
displays Maps Lauched to be 10. The console output:

12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
*12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
*12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
(bytes)=6218842112
12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=3348828160
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=22955810816
12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
*DataLines Maps Reduces AvgTime (milliseconds)
1                20     20           17451
*

Can some one please help me understand this behaviour of Hadoop in this
case. My main purpose of running a MRBench is to calculate the Average time
for certain amount of Maps, Reduces, InputLines etc. If the number of Maps
is not what I submitted, then how can I judge my benchmark results?

Thanks,

Gaurav Dasgupta
+
Hemanth Yamijala 2012-08-29, 05:56
+
Gaurav Dasgupta 2012-08-29, 07:44
+
Hemanth Yamijala 2012-08-29, 08:31
+
Bejoy KS 2012-08-29, 07:50