Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: issue about total input byte of MR job


Copy link to this message
-
Re: issue about total input byte of MR job
It depend on your input data.  E.g. your input consists of 10 files, each
is 65M, then each file will take 2 mappers, overall it would cost 20
mappers, but the input size is actually 650M rather than 20*64=1280M
On Tue, Dec 3, 2013 at 4:28 PM, ch huang <[EMAIL PROTECTED]> wrote:

> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
>         File System Counters
>                 FILE: Number of bytes read=9642910241
>                 FILE: Number of bytes written=120327706125
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=126792190158
>                 HDFS: Number of bytes written=0
>                 HDFS: Number of read operations=8151
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=0
>