Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop IO performance, prefetch etc

Copy link to this message
Hadoop IO performance, prefetch etc
   Most of our map jobs are IO bound. However, for the same node, the IO throughput during the map phase is only 20% of its real sequential IO capability (we tested the sequential IO throughput by iozone)
   I think the reason is that while each map has a sequential IO request, since there are many maps concurrently running on the same node, this causes quite expensive IO switches.
   Prefetch may be a good solution here especially a map job is supposed to scan through an entire block and no more no less. Any idea how to enable it?