Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> "Map input bytes" vs HDFS_BYTES_READ


Copy link to this message
-
Re: "Map input bytes" vs HDFS_BYTES_READ
If map task(s) were retried (mapred.map.max.attempts times), how would these
two counters be affected ?

Thanks

On Tue, Feb 1, 2011 at 7:31 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> HDFS_BYTES_READ is a FileSystem interface counter. It directly deals
> with the FS read (lower level). Map input bytes is what the
> RecordReader has processed in number of bytes for records being read
> from the input stream.
>
> For plain text files, I believe both counters must report about the
> same value, were entire records being read with no operation performed
> on each line. But when you throw in a compressed file, you'll notice
> that the HDFS_BYTES_READ would be far lesser than Map input bytes
> since the disk read was low, but the total content stored in record
> terms was still the same as it would be for an uncompressed file.
>
> Hope this clears it.
>
> On Wed, Feb 2, 2011 at 8:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > In hadoop 0.20.2, what's the relationship between "Map input bytes" and
> > HDFS_BYTES_READ ?
> >
> > <counter group="FileSystemCounters"
> > name="HDFS_BYTES_READ">203446204073</counter>
> > <counter group="FileSystemCounters"
> > name="HDFS_BYTES_WRITTEN">23413127561</counter>
> > <counter group="Map-Reduce Framework" name="Map input
> > records">163502600</counter>
> > <counter group="Map-Reduce Framework" name="Spilled Records">0</counter>
> > <counter group="Map-Reduce Framework" name="Map input
> > bytes">965922136488</counter>
> > <counter group="Map-Reduce Framework" name="Map output
> > records">296754600</counter>
> >
> > Thanks
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>