Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> AvroStorage compression ratio


+
Thejas Nair 2012-10-21, 05:22
+
Ruslan Al-Fakikh 2012-10-22, 13:02
Copy link to this message
-
Re: AvroStorage compression ratio
What was the compression ratio you saw?
I get the correct results, but the data size is almost same as
uncompressed text.

searches = load  '/user/testuser/aol_search_logs.txt' as (ID : int,
Query : chararray, QueryTime : chararray, ItemRank : int, ClickURL :
chararray);
store searches into '/user/testuser/aol_search_logs.avro'  using
AvroStorage();

I also tried -

SET avro.output.codec snappy
SET mapred.output.compress true
searches = load '/user/testuser/aol_search_logs.avro'  using
org.apache.pig.piggybank.storage.avro.AvroStorage();
store searches into '/user/testuser/aol_search_logs.snappy.avro' using
org.apache.pig.piggybank.storage.avro.AvroStorage();

-Thejas

On 10/22/12 6:02 AM, Ruslan Al-Fakikh wrote:
> How do you generate your Avro files?
> It worked OK for me with:
>
> SET avro.mapred.deflate.level 5
> inputData = LOAD 'input path' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> STORE inputData INTO 'output path' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> But I did these tests a long time ago with an old version.
>
> Ruslan
>
> On Sun, Oct 21, 2012 at 9:22 AM, Thejas Nair <[EMAIL PROTECTED]> wrote:
>> Based on AvroStorage code and documentation, it looks like compression is
>> enabled by default, codec set to "deflate". But the file size is almost same
>> as that of uncompressed tab separated text data.
>>
>> This is probably a bug in AvroStorage, but I wanted to check if this is
>> somehow expected, before I open a jira to track it.
>>
>> Uncompressed txt     2.12 GB
>> avro (default compression)    2.09 GB
>> avro + snappy compression     2.09 GB
>> lzo compressed txt      0.69 GB
>>
>>
>> Thanks,
>> Thejas
>>
+
Ruslan Al-Fakikh 2012-10-23, 13:31
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB