Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Snappy Compression Json Data


Copy link to this message
-
Re: Snappy Compression Json Data
Raghu Angadi 2011-10-15, 00:13
if 'STORE' worked, LOAD should work fine too.

On Thu, Oct 13, 2011 at 6:29 PM, Cameron Gandevia <[EMAIL PROTECTED]>wrote:

> Hi
>
> I currently have a bunch of data in json format in hdfs. I would like to
> use
> pig to load it dedupe it and store it back using snappy compression.
>
> Currently I do something like this.
>
> raw = LOAD '$INPUT' USING PigJsonLoader();
> uniq = DISTINCT raw;
> STORE uniq INTO '$OUTPUT' USING PigStorage();
>
> If I add the following to the pig job it seems to write the files with a
> '.snappy' extension
>
> <property>
>  <name>mapred.output.compress</name>
>  <value>true</value>
> </property>
> <property>
>  <name>mapred.output.compression.codec</name>
>  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>  </property>
>  <property>
>   <name>mapred.output.compression.type</name>
>   <value>BLOCK</value>
>  </property>
>
> Is this all I need to do? Or do I need to write it in a different format?
> and is there a way to load the snappy compressed json data or do I need to
> implement a new load function?
>
> any help is much appreciated.
>
> Thanks
>