Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Spill file compression


Copy link to this message
-
Spill file compression
Sigurd Spieckermann 2012-11-07, 12:32
Hi guys,

I've encountered a situation where the ratio between "Map output bytes" and
"Map output materialized bytes" is quite huge and during the map-phase data
is spilled to disk quite a lot. This is something I'll try to optimize, but
I'm wondering if the spill files are compressed at all. I set
mapred.compress.map.output=true
and mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
and everything else seems to be working correctly. Does Hadoop actually
compress spills or just the final spill after finishing the entire map-task?

Thanks,
Sigurd