On Fri, Mar 30, 2012 at 12:08 PM, Shirahatti, Nikhil
<[EMAIL PROTECTED]> wrote:
> Hello All,
> I think I figured our where I goofed up.
> I was flushing on every record, so basically this was compression per
> record, so it had a meta data with each record. This was adding more data
> to the output when compared to avro.
> So now I have better figures: atleast looks realistic, still need to find
> out of it is map-reduceable.
> Avro= 12G
> Avro+Defalte= 4.5G
> Avro+Snappy = 5.5G
> Have others tried Avro + LZO?
Have you checked out jvm-compressor-benchmark page?
It has comparison of quite a few native open source compression codecs.
While test data does not include Avro, I would not expect results to
differ all that much.
LZO isn't a particularly compelling codec in any of combinations
tested. Snappy, LZF and LZ4 (not yet included in public results, but
there's code, and preliminary results are very good) are the fastest
Gzip (deflate) produces more compact results, and is fastest of "high
compression" codecs (although significantly lower than lzf/snappy/lz4)
-+ Tatu +-
ps. If anyone has publically available set of Avro data, it would be
quite easy to add Avro-data test to jvm compressor benchmark