Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Spill file compression

Copy link to this message
Spill file compression
Hi guys,

I've encountered a situation where the ratio between "Map output bytes" and
"Map output materialized bytes" is quite huge and during the map-phase data
is spilled to disk quite a lot. This is something I'll try to optimize, but
I'm wondering if the spill files are compressed at all. I set
and mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
and everything else seems to be working correctly. Does Hadoop actually
compress spills or just the final spill after finishing the entire map-task?