Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> "Map input bytes" vs HDFS_BYTES_READ


Copy link to this message
-
Re: "Map input bytes" vs HDFS_BYTES_READ
>From my limited experiment, I think "Map input bytes" reflects the number of
bytes of local data file(s) when LocalJobRunner is used.

Correct me if I am wrong.

On Tue, Feb 1, 2011 at 7:52 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Each task counts independently of its attempt/other tasks, thereby
> making the aggregates easier to control. Final counters are aggregated
> only from successfully committed tasks. During the job's run, however,
> counters are shown aggregated from the most successful attempts of a
> task thus far.
>
> On Wed, Feb 2, 2011 at 9:09 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > If map task(s) were retried (mapred.map.max.attempts times), how would
> these
> > two counters be affected ?
> >
> > Thanks
> >
> > On Tue, Feb 1, 2011 at 7:31 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >
> >> HDFS_BYTES_READ is a FileSystem interface counter. It directly deals
> >> with the FS read (lower level). Map input bytes is what the
> >> RecordReader has processed in number of bytes for records being read
> >> from the input stream.
> >>
> >> For plain text files, I believe both counters must report about the
> >> same value, were entire records being read with no operation performed
> >> on each line. But when you throw in a compressed file, you'll notice
> >> that the HDFS_BYTES_READ would be far lesser than Map input bytes
> >> since the disk read was low, but the total content stored in record
> >> terms was still the same as it would be for an uncompressed file.
> >>
> >> Hope this clears it.
> >>
> >> On Wed, Feb 2, 2011 at 8:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >> > In hadoop 0.20.2, what's the relationship between "Map input bytes"
> and
> >> > HDFS_BYTES_READ ?
> >> >
> >> > <counter group="FileSystemCounters"
> >> > name="HDFS_BYTES_READ">203446204073</counter>
> >> > <counter group="FileSystemCounters"
> >> > name="HDFS_BYTES_WRITTEN">23413127561</counter>
> >> > <counter group="Map-Reduce Framework" name="Map input
> >> > records">163502600</counter>
> >> > <counter group="Map-Reduce Framework" name="Spilled
> Records">0</counter>
> >> > <counter group="Map-Reduce Framework" name="Map input
> >> > bytes">965922136488</counter>
> >> > <counter group="Map-Reduce Framework" name="Map output
> >> > records">296754600</counter>
> >> >
> >> > Thanks
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >> www.harshj.com
> >>
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB