-Re: AvroStorage compression ratio
Thejas Nair 2012-10-22, 22:51
What was the compression ratio you saw?
I get the correct results, but the data size is almost same as
searches = load '/user/testuser/aol_search_logs.txt' as (ID : int,
Query : chararray, QueryTime : chararray, ItemRank : int, ClickURL :
store searches into '/user/testuser/aol_search_logs.avro' using
I also tried -
SET avro.output.codec snappy
SET mapred.output.compress true
searches = load '/user/testuser/aol_search_logs.avro' using
store searches into '/user/testuser/aol_search_logs.snappy.avro' using
On 10/22/12 6:02 AM, Ruslan Al-Fakikh wrote:
> How do you generate your Avro files?
> It worked OK for me with:
> SET avro.mapred.deflate.level 5
> inputData = LOAD 'input path' USING
> STORE inputData INTO 'output path' USING
> But I did these tests a long time ago with an old version.
> On Sun, Oct 21, 2012 at 9:22 AM, Thejas Nair <[EMAIL PROTECTED]> wrote:
>> Based on AvroStorage code and documentation, it looks like compression is
>> enabled by default, codec set to "deflate". But the file size is almost same
>> as that of uncompressed tab separated text data.
>> This is probably a bug in AvroStorage, but I wanted to check if this is
>> somehow expected, before I open a jira to track it.
>> Uncompressed txt 2.12 GB
>> avro (default compression) 2.09 GB
>> avro + snappy compression 2.09 GB
>> lzo compressed txt 0.69 GB