Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Spill file compression

Copy link to this message
Re: Spill file compression
Yes we do compress each spill output using the same codec as specified
for map (intermediate) output compression. However, the counted bytes
may be counting decompressed values of the records written, and not
post-compressed ones.

On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann
> Hi guys,
> I've encountered a situation where the ratio between "Map output bytes" and
> "Map output materialized bytes" is quite huge and during the map-phase data
> is spilled to disk quite a lot. This is something I'll try to optimize, but
> I'm wondering if the spill files are compressed at all. I set
> mapred.compress.map.output=true and
> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
> and everything else seems to be working correctly. Does Hadoop actually
> compress spills or just the final spill after finishing the entire map-task?
> Thanks,
> Sigurd

Harsh J