Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Hadoop IO performance, prefetch etc

Copy link to this message
Hadoop IO performance, prefetch etc
Songting Chen 2009-02-05, 06:55
   Most of our map jobs are IO bound. However, for the same node, the IO throughput during the map phase is only 20% of its real sequential IO capability (we tested the sequential IO throughput by iozone)
   I think the reason is that while each map has a sequential IO request, since there are many maps concurrently running on the same node, this causes quite expensive IO switches.
   Prefetch may be a good solution here especially a map job is supposed to scan through an entire block and no more no less. Any idea how to enable it?