|
|
-
Re: AvroStorage compression ratioRuslan Al-Fakikh 2012-10-23, 13:31
For me it was:
27.5G for uncompressed tab-delimited plain txt when compressed: Format Size sequence files 1.6G avro deflate with level 1 2.9G avro deflate with level 5 2.4G avro deflate with level 9 2.2G avro snappy 4.1G I was using this: https://ccp.cloudera.com/display/CDHDOC/Avro+Usage#AvroUsage-Pig with CDH 3 Best Regards On Tue, Oct 23, 2012 at 2:51 AM, Thejas Nair <[EMAIL PROTECTED]> wrote: > What was the compression ratio you saw? > I get the correct results, but the data size is almost same as uncompressed > text. > > searches = load '/user/testuser/aol_search_logs.txt' as (ID : int, Query : > chararray, QueryTime : chararray, ItemRank : int, ClickURL : chararray); > store searches into '/user/testuser/aol_search_logs.avro' using > AvroStorage(); > > I also tried - > > SET avro.output.codec snappy > SET mapred.output.compress true > searches = load '/user/testuser/aol_search_logs.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); > store searches into '/user/testuser/aol_search_logs.snappy.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); > > -Thejas > > > > > On 10/22/12 6:02 AM, Ruslan Al-Fakikh wrote: >> >> How do you generate your Avro files? >> It worked OK for me with: >> >> SET avro.mapred.deflate.level 5 >> inputData = LOAD 'input path' USING >> org.apache.pig.piggybank.storage.avro.AvroStorage(); >> STORE inputData INTO 'output path' USING >> org.apache.pig.piggybank.storage.avro.AvroStorage(); >> >> But I did these tests a long time ago with an old version. >> >> Ruslan >> >> On Sun, Oct 21, 2012 at 9:22 AM, Thejas Nair <[EMAIL PROTECTED]> >> wrote: >>> >>> Based on AvroStorage code and documentation, it looks like compression is >>> enabled by default, codec set to "deflate". But the file size is almost >>> same >>> as that of uncompressed tab separated text data. >>> >>> This is probably a bug in AvroStorage, but I wanted to check if this is >>> somehow expected, before I open a jira to track it. >>> >>> Uncompressed txt 2.12 GB >>> avro (default compression) 2.09 GB >>> avro + snappy compression 2.09 GB >>> lzo compressed txt 0.69 GB >>> >>> >>> Thanks, >>> Thejas >>> > |