Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Snappy in Mapreduce


+
Marek Miglinski 2012-02-01, 08:34
+
Harsh J 2012-02-01, 11:22
Copy link to this message
-
Re: Snappy in Mapreduce
Also, if you want finalized outputs in LZO, set
"mapred.output.compression.codec" to that codec. You have it set to
Snappy presently.

On Wed, Feb 1, 2012 at 2:04 PM, Marek Miglinski <[EMAIL PROTECTED]> wrote:
> Hello guys,
>
> I have a Clouderas CDH3U2 package installed on a 3 node cluster and I've added to mapred-site:
>    <property>
>        <name>mapred.compress.map.output</name>
>        <value>true</value>
>    </property>
>
>    <property>
>        <name>mapred.map.output.compression.codec</name>
>        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>    </property>
>
> Also to my pig job properties:
>                <property>
>                    <name>io.compression.codec.lzo.class</name>
>                    <value>com.hadoop.compression.lzo.LzoCodec</value>
>                </property>
>                <property>
>                    <name>pig.tmpfilecompression</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>pig.tmpfilecompression.codec</name>
>                    <value>lzo</value>
>                </property>
>                <property>
>                    <name>mapred.output.compress</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapred.output.compression.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>                <property>
>                    <name>mapred.output.compression.type</name>
>                    <value>BLOCK</value>
>                </property>
>                <property>
>                    <name>mapred.compress.map.output</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapred.map.output.compression.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>                <property>
>                    <name>mapreduce.map.output.compress</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapreduce.map.output.compress.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>
> So I want PIG to compress it's data with LZO but mapreduce with Snappy, but as I see in the tasktracker details (Map Bytes Out) data is not compressed at all, which reduces performance a lot (IO is 100% most of the time)... What am I doing wrong and how do I fix it?
>
>
> Thanks,
> Marek M.

--
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about
+
Marek Miglinski 2012-02-06, 13:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB