Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> AvroStorage compression ratio


+
Thejas Nair 2012-10-21, 05:22
Copy link to this message
-
Re: AvroStorage compression ratio
How do you generate your Avro files?
It worked OK for me with:

SET avro.mapred.deflate.level 5
inputData = LOAD 'input path' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();
STORE inputData INTO 'output path' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

But I did these tests a long time ago with an old version.

Ruslan

On Sun, Oct 21, 2012 at 9:22 AM, Thejas Nair <[EMAIL PROTECTED]> wrote:
> Based on AvroStorage code and documentation, it looks like compression is
> enabled by default, codec set to "deflate". But the file size is almost same
> as that of uncompressed tab separated text data.
>
> This is probably a bug in AvroStorage, but I wanted to check if this is
> somehow expected, before I open a jira to track it.
>
> Uncompressed txt     2.12 GB
> avro (default compression)    2.09 GB
> avro + snappy compression     2.09 GB
> lzo compressed txt      0.69 GB
>
>
> Thanks,
> Thejas
>
+
Thejas Nair 2012-10-22, 22:51
+
Ruslan Al-Fakikh 2012-10-23, 13:31