|
|
-
Re: AvroStorage compression ratioThejas Nair 2012-10-22, 22:51
What was the compression ratio you saw?
I get the correct results, but the data size is almost same as uncompressed text. searches = load '/user/testuser/aol_search_logs.txt' as (ID : int, Query : chararray, QueryTime : chararray, ItemRank : int, ClickURL : chararray); store searches into '/user/testuser/aol_search_logs.avro' using AvroStorage(); I also tried - SET avro.output.codec snappy SET mapred.output.compress true searches = load '/user/testuser/aol_search_logs.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); store searches into '/user/testuser/aol_search_logs.snappy.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); -Thejas On 10/22/12 6:02 AM, Ruslan Al-Fakikh wrote: > How do you generate your Avro files? > It worked OK for me with: > > SET avro.mapred.deflate.level 5 > inputData = LOAD 'input path' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > STORE inputData INTO 'output path' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > > But I did these tests a long time ago with an old version. > > Ruslan > > On Sun, Oct 21, 2012 at 9:22 AM, Thejas Nair <[EMAIL PROTECTED]> wrote: >> Based on AvroStorage code and documentation, it looks like compression is >> enabled by default, codec set to "deflate". But the file size is almost same >> as that of uncompressed tab separated text data. >> >> This is probably a bug in AvroStorage, but I wanted to check if this is >> somehow expected, before I open a jira to track it. >> >> Uncompressed txt 2.12 GB >> avro (default compression) 2.09 GB >> avro + snappy compression 2.09 GB >> lzo compressed txt 0.69 GB >> >> >> Thanks, >> Thejas >> |