Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> RE: MRBench Maps strange behaviour


+
Leo Leung 2012-08-29, 17:11
+
Gaurav Dasgupta 2012-08-28, 10:32
Copy link to this message
-
Re: MRBench Maps strange behaviour
Hi,

The number of maps specified to any map reduce program (including
those part of MRBench) is generally only a hint, and the actual number
of maps will be influenced in typical cases by the amount of data
being processed. You can take a look at this wiki link to understand
more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

In the examples below, since the data you've generated is different,
the number of mappers are different. To be able to judge your
benchmark results, you'd need to benchmark against the same data (or
at least same type of type - i.e. size and type).

The number of maps printed at the end is straight from the input
specified and doesn't reflect what the job actually ran with. The
information from the counters is the right one.

Thanks
Hemanth

On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I executed the "MRBench" program from "hadoop-test.jar" in my 12 node CDH3
> cluster. After executing, I had some strange observations regarding the
> number of Maps it ran.
>
> First I ran the command:
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps 200
> -reduces 200 -inputLines 1024 -inputType random
> And I could see that the actual number of Maps it ran was 201 (for all the 3
> runs) instead of 200 (Though the end report displays the launched to be
> 200). Here is the console report:
>
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>
> 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
>
>
>
> Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
>
>
>
> This time the actual number of Maps were only 2 and again the end report
> displays Maps Lauched to be 10. The console output:
>
>
>
> 12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
> 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
+
Gaurav Dasgupta 2012-08-29, 07:44
+
Hemanth Yamijala 2012-08-29, 08:31
+
Bejoy KS 2012-08-29, 07:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB