Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Spill file compression


Copy link to this message
-
Spill file compression
Hi guys,

I've encountered a situation where the ratio between "Map output bytes" and
"Map output materialized bytes" is quite huge and during the map-phase data
is spilled to disk quite a lot. This is something I'll try to optimize, but
I'm wondering if the spill files are compressed at all. I set
mapred.compress.map.output=true
and mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
and everything else seems to be working correctly. Does Hadoop actually
compress spills or just the final spill after finishing the entire map-task?

Thanks,
Sigurd
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB