I am analyzing some HDFS counters, and I have these questions?
1 - The "HDFS: Number of bytes read" as long as the map tasks read data
from the HDFS, or it is a pre-calculated sum before the mappers start to
2 - With these metrics, it was written some data in the HDFS before the map
tasks start. Does anyone have an opinion if it is possible the map tasks
write the intermediate output in thi HDFS? This happens because this job
defined by the user forces to (I don't know what this job does)?
<mapcompletion>map() completion: 0.9946828</mapcompletion>
<redcompletion>reduce() completion: 0.0</redcompletion>
<hdfs>HDFS: Number of bytes read=314470180</hdfs>
<hdfs>HDFS: Number of bytes written=313912087</hdfs>