Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: issue about total input byte of MR job


Copy link to this message
-
Re: issue about total input byte of MR job
It depend on your input data.  E.g. your input consists of 10 files, each
is 65M, then each file will take 2 mappers, overall it would cost 20
mappers, but the input size is actually 650M rather than 20*64=1280M
On Tue, Dec 3, 2013 at 4:28 PM, ch huang <[EMAIL PROTECTED]> wrote:

> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
>         File System Counters
>                 FILE: Number of bytes read=9642910241
>                 FILE: Number of bytes written=120327706125
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=126792190158
>                 HDFS: Number of bytes written=0
>                 HDFS: Number of read operations=8151
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=0
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB