Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: MAP_INPUT_BYTES missing from counters


Copy link to this message
-
Re: MAP_INPUT_BYTES missing from counters
Hi

     Is your input file compressed or named with the suffix gz ,or like that?
     It is interesting .
     Map_input_bytes is the number of bytes of uncompressed  input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]

   Please inform of us ,if you get anything further.

Regards.

发自我的 iPhone

在 2013-4-6,0:01,Philippe Signoret <[EMAIL PROTECTED]> 写道:

> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
>
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
>
> Job Counters
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS: 6006
>  FALLOW_SLOTS_MILLIS_REDUCES: 0
>  FALLOW_SLOTS_MILLIS_MAPS: 0
>  TOTAL_LAUNCHED_MAPS: 1
>  DATA_LOCAL_MAPS: 1
>  SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
>  BYTES_WRITTEN: 366752
> FileSystemCounters
>  FILE_BYTES_READ: 505552
>  HDFS_BYTES_READ: 1085517
>  FILE_BYTES_WRITTEN: 1122685
>  HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
>  BYTES_READ: 1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES: 505552
>  MAP_INPUT_RECORDS: 19446
>  REDUCE_SHUFFLE_BYTES: 505552
>  SPILLED_RECORDS: 70358
>  MAP_OUTPUT_BYTES: 1750111
>  CPU_MILLISECONDS: 5700
>  COMMITTED_HEAP_BYTES: 401997824
>  COMBINE_INPUT_RECORDS: 181151
>  SPLIT_RAW_BYTES: 160
>  REDUCE_INPUT_RECORDS: 35179
>  REDUCE_INPUT_GROUPS: 35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES: 378482688
>  REDUCE_OUTPUT_RECORDS: 35179
>  VIRTUAL_MEMORY_BYTES: 1139838976
>  MAP_OUTPUT_RECORDS: 181151
>
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55
+
Philippe Signoret 2013-04-06, 23:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB