I am benchmarking my cluster of 16 nodes (all in one rack) with TestDFSIO on
Hadoop 1.0.4. For simplicity, I turned off speculative task execution and set
the max map and reduce tasks to 1.
With a replication factor of 2, writing 1 file of 5GB takes twice as long as
reading 1 file. This result seems to make sense since the replication results
in twice the I/O in the cluster versus the read. However, as I scale up the
number of 5GB files from 1 to 64 files, reading ultimately takes as long as
writing. In particular, I see this result when writing and reading 64
What could cause read performance to degrade faster than write performance
as the number of files increases?
The full results (number of 5GB files, ratio of write time to read
time) are below: