Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - RE: MRBench Maps strange behaviour


Copy link to this message
-
RE: MRBench Maps strange behaviour
Leo Leung 2012-08-29, 17:11
mrbench "actual lunched map task" depends on the number of inputLines.

So in your first case, you did specify more input that maps, hence all maps are lunched.

The default inputLines is 1,  which is (cough cough)  quite oblivious to the number of maps you specify.
(That was your second case)
From: praveenesh kumar [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 29, 2012 1:45 AM
To: [EMAIL PROTECTED]
Subject: Re: MRBench Maps strange behaviour

Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following parameter:

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers. But he is getting some different results which are visible in the counters and we can use all our map and input-split logics to justify the counter outputs.

The question arises here -- how can we use MRBench -- what it provides you ? How can we control it to run with different parameters to do some benchmarking ? Can someone explain how to use MRBench and what it exactly does.

Regards,
Praveenesh
On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
+
Gaurav Dasgupta 2012-08-28, 10:32
+
Hemanth Yamijala 2012-08-29, 05:56
+
Gaurav Dasgupta 2012-08-29, 07:44
+
Hemanth Yamijala 2012-08-29, 08:31
+
Bejoy KS 2012-08-29, 07:50