Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Number of Maps running more than expected

Copy link to this message
Re: Number of Maps running more than expected
Hi Bejoy,

The total number of Maps in the RandomTextWriter execution were 100 and
hence the total number of input files for WordCount are 100.
My dfs.block.size = 128MB and I have not changed the
mapred.max.split.size and could not find it in myJob.xml file.
Hence refering the formula *max(minsplitsize, min(maxsplitsize, blocksize))*,
I am assuming the mapred.max.split.size to be 128MB.
If I calculate the blocks per file [bytes per file / block size (128 MB)]
gives me 8.21 for all. And then if I sum up them it becomes 821.22 (Same as
my previous calculation).

I have some how managed to do a need copy of the Job.xml in a word doc. I
copied it from browser as I cannot recover it in the hdfs. Please find it
in the attachment. You may refer the parameters and configuration there. I
have also attached the console output for the bytes per file in the
WordCount input.

Gaurav Dasgupta
On Fri, Aug 17, 2012 at 3:28 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:

> Hi Gaurav
> To add on more clarity to my previous mail
> If you are using the default TextInputFormat there will be *atleast* one
> task generated per file even if the file size is less than
> the block size. (assuming you have split size equal to block size)
> So the right way to calculate the number of splits is per file and not on
> the whole input data size. Calculate number of blocks per file and summing
> up those values from all files would equate to the number of mappers.
> What is the value of mapred.max.splitsize in your job? If it is less than
> the hdfs block size there will be more spits for even for a hdfs block.
> Regards
> Bejoy KS