Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Snappy Compression Json Data


Copy link to this message
-
Re: Snappy Compression Json Data
if 'STORE' worked, LOAD should work fine too.

On Thu, Oct 13, 2011 at 6:29 PM, Cameron Gandevia <[EMAIL PROTECTED]>wrote:

> Hi
>
> I currently have a bunch of data in json format in hdfs. I would like to
> use
> pig to load it dedupe it and store it back using snappy compression.
>
> Currently I do something like this.
>
> raw = LOAD '$INPUT' USING PigJsonLoader();
> uniq = DISTINCT raw;
> STORE uniq INTO '$OUTPUT' USING PigStorage();
>
> If I add the following to the pig job it seems to write the files with a
> '.snappy' extension
>
> <property>
>  <name>mapred.output.compress</name>
>  <value>true</value>
> </property>
> <property>
>  <name>mapred.output.compression.codec</name>
>  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>  </property>
>  <property>
>   <name>mapred.output.compression.type</name>
>   <value>BLOCK</value>
>  </property>
>
> Is this all I need to do? Or do I need to write it in a different format?
> and is there a way to load the snappy compressed json data or do I need to
> implement a new load function?
>
> any help is much appreciated.
>
> Thanks
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB