Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> avro compression using snappy and deflate

snikhil0 2012-03-30, 07:43
easyvoip@... 2012-03-30, 07:51
snikhil0 2012-03-30, 07:54
Shirahatti, Nikhil 2012-03-30, 19:08
Tatu Saloranta 2012-03-30, 19:45
Copy link to this message
Re: avro compression using snappy and deflate

On 3/30/12 12:08 PM, "Shirahatti, Nikhil" <[EMAIL PROTECTED]> wrote:

>Hello All,
>I think I figured our where I goofed up.
>I was flushing on every record, so basically this was compression per
>record, so it had a meta data with each record. This was adding more data
>to the output when compared to avro.
>So now I have better figures: atleast looks realistic, still need to find
>out of it is map-reduceable.
>Avro= 12G
>Avro+Defalte= 4.5G

Deflate is affected quite a bit by the compression level selected (1 to 9)
in both performance and level of compression.  However, in my experience
anything past level 6 is only very slightly smaller and much slower, while
the difference between levels 1 to 3 is large on both fronts.

>Avro+Snappy = 5.5G
>Have others tried Avro + LZO?

I have not heard of anyone doing this.  LZO is not Apache license
compatible, and there are now several alternatives that are in the same
class of compression algorithm available, including Snappy.

>On 3/30/12 12:54 AM, "Shirahatti, Nikhil" <[EMAIL PROTECTED]> wrote:
>>The original data file (a text file) is 40GB, the avro file is about
>>avro snappy is 13GB!
>>View this message in context:
>>Sent from the Avro - Users mailing list archive at Nabble.com.