Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Snappy in Mapreduce


Copy link to this message
-
Re: Snappy in Mapreduce
Marek,

Map Output Bytes are the real # of bytes output from the mapper, and
the count of that is not after the compression. If this is an MR job,
you probably want to see File Bytes Written counter for the map phase,
or the Reduce shuffle bytes for the reduce phase.

On Wed, Feb 1, 2012 at 2:04 PM, Marek Miglinski <[EMAIL PROTECTED]> wrote:
> Hello guys,
>
> I have a Clouderas CDH3U2 package installed on a 3 node cluster and I've added to mapred-site:
>    <property>
>        <name>mapred.compress.map.output</name>
>        <value>true</value>
>    </property>
>
>    <property>
>        <name>mapred.map.output.compression.codec</name>
>        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>    </property>
>
> Also to my pig job properties:
>                <property>
>                    <name>io.compression.codec.lzo.class</name>
>                    <value>com.hadoop.compression.lzo.LzoCodec</value>
>                </property>
>                <property>
>                    <name>pig.tmpfilecompression</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>pig.tmpfilecompression.codec</name>
>                    <value>lzo</value>
>                </property>
>                <property>
>                    <name>mapred.output.compress</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapred.output.compression.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>                <property>
>                    <name>mapred.output.compression.type</name>
>                    <value>BLOCK</value>
>                </property>
>                <property>
>                    <name>mapred.compress.map.output</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapred.map.output.compression.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>                <property>
>                    <name>mapreduce.map.output.compress</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapreduce.map.output.compress.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>
> So I want PIG to compress it's data with LZO but mapreduce with Snappy, but as I see in the tasktracker details (Map Bytes Out) data is not compressed at all, which reduces performance a lot (IO is 100% most of the time)... What am I doing wrong and how do I fix it?
>
>
> Thanks,
> Marek M.

--
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB