Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> avro compression using snappy and deflate

snikhil0 2012-03-30, 07:43
easyvoip@... 2012-03-30, 07:51
snikhil0 2012-03-30, 07:54
Shirahatti, Nikhil 2012-03-30, 19:08
Copy link to this message
Re: avro compression using snappy and deflate
On Fri, Mar 30, 2012 at 12:08 PM, Shirahatti, Nikhil
> Hello All,
> I think I figured our where I goofed up.
> I was flushing on every record, so basically this was compression per
> record, so it had a meta data with each record. This was adding more data
> to the output when compared to avro.
> So now I have better figures: atleast looks realistic, still need to find
> out of it is map-reduceable.
> Avro= 12G
> Avro+Defalte= 4.5G
> Avro+Snappy = 5.5G
> Have others tried Avro + LZO?

Have you checked out jvm-compressor-benchmark page?
It has comparison of quite a few native open source compression codecs.
While test data does not include Avro, I would not expect results to
differ all that much.

LZO isn't a particularly compelling codec in any of combinations
tested. Snappy, LZF and LZ4 (not yet included in public results, but
there's code, and preliminary results are very good) are the fastest
Java codecs.
Gzip (deflate) produces more compact results, and is fastest of "high
compression" codecs (although significantly lower than lzf/snappy/lz4)

-+ Tatu +-

ps. If anyone has publically available set of Avro data, it would be
quite easy to add Avro-data test to jvm compressor benchmark
Scott Carey 2012-04-02, 15:31