On 3/30/12 12:08 PM, "Shirahatti, Nikhil" <[EMAIL PROTECTED]> wrote:
>I think I figured our where I goofed up.
>I was flushing on every record, so basically this was compression per
>record, so it had a meta data with each record. This was adding more data
>to the output when compared to avro.
>So now I have better figures: atleast looks realistic, still need to find
>out of it is map-reduceable.
Deflate is affected quite a bit by the compression level selected (1 to 9)
in both performance and level of compression. However, in my experience
anything past level 6 is only very slightly smaller and much slower, while
the difference between levels 1 to 3 is large on both fronts.
>Avro+Snappy = 5.5G
>Have others tried Avro + LZO?
I have not heard of anyone doing this. LZO is not Apache license
compatible, and there are now several alternatives that are in the same
class of compression algorithm available, including Snappy.
>On 3/30/12 12:54 AM, "Shirahatti, Nikhil" <[EMAIL PROTECTED]> wrote:
>>The original data file (a text file) is 40GB, the avro file is about
>>avro snappy is 13GB!
>>View this message in context:
>>Sent from the Avro - Users mailing list archive at Nabble.com.