Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Spill file compression


+
Sigurd Spieckermann 2012-11-07, 12:32
Copy link to this message
-
Re: Spill file compression
Yes we do compress each spill output using the same codec as specified
for map (intermediate) output compression. However, the counted bytes
may be counting decompressed values of the records written, and not
post-compressed ones.

On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann
<[EMAIL PROTECTED]> wrote:
> Hi guys,
>
> I've encountered a situation where the ratio between "Map output bytes" and
> "Map output materialized bytes" is quite huge and during the map-phase data
> is spilled to disk quite a lot. This is something I'll try to optimize, but
> I'm wondering if the spill files are compressed at all. I set
> mapred.compress.map.output=true and
> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
> and everything else seems to be working correctly. Does Hadoop actually
> compress spills or just the final spill after finishing the entire map-task?
>
> Thanks,
> Sigurd

--
Harsh J
+
Sigurd Spieckermann 2012-11-07, 13:12
+
Sigurd Spieckermann 2012-11-07, 13:18
+
Sigurd Spieckermann 2012-11-07, 14:29
+
Sigurd Spieckermann 2012-11-07, 15:14