|
|
-
Hadoop IO performance, prefetch etcSongting Chen 2009-02-05, 06:55
Hi,
Most of our map jobs are IO bound. However, for the same node, the IO throughput during the map phase is only 20% of its real sequential IO capability (we tested the sequential IO throughput by iozone) I think the reason is that while each map has a sequential IO request, since there are many maps concurrently running on the same node, this causes quite expensive IO switches. Prefetch may be a good solution here especially a map job is supposed to scan through an entire block and no more no less. Any idea how to enable it? Thanks, -Songting |