Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Number of Maps running more than expected


Copy link to this message
-
Re: Number of Maps running more than expected
Gaurav Dasgupta 2012-08-17, 09:14
Hi Anil,

The speculative execution property is off from the begining.
In addition to my previous mail, I would like to add some more points:

I have checked the same while generating 100 GB data from RandomTextWriter,
where "hadoop fs -dus <hdfs output dir>" gives me 102.65 GB.
So if I calculate the number of Maps while running WordCount on this should
be ((102.65 * 1024) MB / 128 MB) = 821.22. So, there should be 822 Maps
running, but the actual number of Maps running are 900, i.e., extra 78.
Also, the number of Maps ran for the above RandomTextWriter were 100.

Above is not the same if I generate data using TeraGen (hadoop jar
hadoop-examples.jar teragen 10000 <output_dir>) and then perform WordCount
on it, it gives me the number of Maps = 2 (Note: Number of Maps for TeraGen
was also 2).
Please find in the attachement the screenshots of the JobTracker UI for
RandomTextWriter and WordCount for your reference.

Regards,
Gaurav Dasgupta
On Thu, Aug 16, 2012 at 7:57 PM, Anil Gupta <[EMAIL PROTECTED]> wrote:

>  Hi Gaurav,
>
> Did you turn off speculative execution?
>
> Best Regards,
> Anil
>
> On Aug 16, 2012, at 7:13 AM, Gaurav Dasgupta <[EMAIL PROTECTED]> wrote:
>
>   Hi users,
>
> I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
> the 12 nodes and 1 node running the Job Tracker).
> In order to perform a WordCount benchmark test, I did the following:
>
>    - Executed "RandomTextWriter" first to create 100 GB data (Note that I
>    have changed the "test.randomtextwrite.total_bytes" parameter only, rest
>    all are kept default).
>    - Next, executed the "WordCount" program for that 100 GB dataset.
>
> The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to my
> calculation, total number of Maps to be executed by the wordcount job
> should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
> But when I am executing the job, it is running a total number of 900 Maps,
> i.e., 100 extra.
> So, why this extra number of Maps? Although, my job is completing
> successfully without any error.
>
> Again, if I don't execute the "RandomTextWwriter" job to create data for
> my wordcount, rather I put my own 100 GB text file in HDFS and run
> "WordCount", I can then see the number of Maps are equivalent to my
> calculation, i.e., 800.
>
> Can anyone tell me why this odd behaviour of Hadoop regarding the number
> of Maps for WordCount only when the dataset is generated by
> RandomTextWriter? And what is the purpose of these extra number of Maps?
>
> Regards,
> Gaurav Dasgupta
>
>