Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Snappy in Mapreduce


+
Marek Miglinski 2012-02-01, 08:34
+
Harsh J 2012-02-01, 11:22
Copy link to this message
-
Re: Snappy in Mapreduce
Also, if you want finalized outputs in LZO, set
"mapred.output.compression.codec" to that codec. You have it set to
Snappy presently.

On Wed, Feb 1, 2012 at 2:04 PM, Marek Miglinski <[EMAIL PROTECTED]> wrote:
> Hello guys,
>
> I have a Clouderas CDH3U2 package installed on a 3 node cluster and I've added to mapred-site:
>    <property>
>        <name>mapred.compress.map.output</name>
>        <value>true</value>
>    </property>
>
>    <property>
>        <name>mapred.map.output.compression.codec</name>
>        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>    </property>
>
> Also to my pig job properties:
>                <property>
>                    <name>io.compression.codec.lzo.class</name>
>                    <value>com.hadoop.compression.lzo.LzoCodec</value>
>                </property>
>                <property>
>                    <name>pig.tmpfilecompression</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>pig.tmpfilecompression.codec</name>
>                    <value>lzo</value>
>                </property>
>                <property>
>                    <name>mapred.output.compress</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapred.output.compression.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>                <property>
>                    <name>mapred.output.compression.type</name>
>                    <value>BLOCK</value>
>                </property>
>                <property>
>                    <name>mapred.compress.map.output</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapred.map.output.compression.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>                <property>
>                    <name>mapreduce.map.output.compress</name>
>                    <value>true</value>
>                </property>
>                <property>
>                    <name>mapreduce.map.output.compress.codec</name>
>                    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>                </property>
>
> So I want PIG to compress it's data with LZO but mapreduce with Snappy, but as I see in the tasktracker details (Map Bytes Out) data is not compressed at all, which reduces performance a lot (IO is 100% most of the time)... What am I doing wrong and how do I fix it?
>
>
> Thanks,
> Marek M.

--
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about
+
Marek Miglinski 2012-02-06, 13:14