Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: MAP_INPUT_BYTES missing from counters


Copy link to this message
-
Re: MAP_INPUT_BYTES missing from counters
yypvsxf19870706 2013-04-06, 11:37
Hi

     Is your input file compressed or named with the suffix gz ,or like that?
     It is interesting .
     Map_input_bytes is the number of bytes of uncompressed  input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]

   Please inform of us ,if you get anything further.

Regards.

发自我的 iPhone

在 2013-4-6,0:01,Philippe Signoret <[EMAIL PROTECTED]> 写道:

> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
>
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
>
> Job Counters
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS: 6006
>  FALLOW_SLOTS_MILLIS_REDUCES: 0
>  FALLOW_SLOTS_MILLIS_MAPS: 0
>  TOTAL_LAUNCHED_MAPS: 1
>  DATA_LOCAL_MAPS: 1
>  SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
>  BYTES_WRITTEN: 366752
> FileSystemCounters
>  FILE_BYTES_READ: 505552
>  HDFS_BYTES_READ: 1085517
>  FILE_BYTES_WRITTEN: 1122685
>  HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
>  BYTES_READ: 1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES: 505552
>  MAP_INPUT_RECORDS: 19446
>  REDUCE_SHUFFLE_BYTES: 505552
>  SPILLED_RECORDS: 70358
>  MAP_OUTPUT_BYTES: 1750111
>  CPU_MILLISECONDS: 5700
>  COMMITTED_HEAP_BYTES: 401997824
>  COMBINE_INPUT_RECORDS: 181151
>  SPLIT_RAW_BYTES: 160
>  REDUCE_INPUT_RECORDS: 35179
>  REDUCE_INPUT_GROUPS: 35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES: 378482688
>  REDUCE_OUTPUT_RECORDS: 35179
>  VIRTUAL_MEMORY_BYTES: 1139838976
>  MAP_OUTPUT_RECORDS: 181151
>
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55