Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> AvroStorage compression ratio

Copy link to this message
Re: AvroStorage compression ratio
What was the compression ratio you saw?
I get the correct results, but the data size is almost same as
uncompressed text.

searches = load  '/user/testuser/aol_search_logs.txt' as (ID : int,
Query : chararray, QueryTime : chararray, ItemRank : int, ClickURL :
store searches into '/user/testuser/aol_search_logs.avro'  using

I also tried -

SET avro.output.codec snappy
SET mapred.output.compress true
searches = load '/user/testuser/aol_search_logs.avro'  using
store searches into '/user/testuser/aol_search_logs.snappy.avro' using


On 10/22/12 6:02 AM, Ruslan Al-Fakikh wrote:
> How do you generate your Avro files?
> It worked OK for me with:
> SET avro.mapred.deflate.level 5
> inputData = LOAD 'input path' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> STORE inputData INTO 'output path' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> But I did these tests a long time ago with an old version.
> Ruslan
> On Sun, Oct 21, 2012 at 9:22 AM, Thejas Nair <[EMAIL PROTECTED]> wrote:
>> Based on AvroStorage code and documentation, it looks like compression is
>> enabled by default, codec set to "deflate". But the file size is almost same
>> as that of uncompressed tab separated text data.
>> This is probably a bug in AvroStorage, but I wanted to check if this is
>> somehow expected, before I open a jira to track it.
>> Uncompressed txt     2.12 GB
>> avro (default compression)    2.09 GB
>> avro + snappy compression     2.09 GB
>> lzo compressed txt      0.69 GB
>> Thanks,
>> Thejas