Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Avro file size is too big


Copy link to this message
-
Avro file size is too big
Ruslan Al-Fakikh 2012-07-04, 13:32
Hello,

In my organization currently we are evaluating Avro as a format. Our
concern is file size. I've done some comparisons of a piece of our
data.
Say we have sequence files, compressed. The payload (values) are just
lines. As far as I know we use line number as keys and we use the
default codec for compression inside sequence files. The size is 1.6G,
when I put it to avro with deflate codec with deflate level 9 it
becomes 2.2G.
This is interesting, because the values in seq files are just string,
but Avro has a normal schema with primitive types. And those are kept
binary. Shouldn't Avro be less in size?
Also I took another dataset which is 28G (gzip files, plain
tab-delimited text, don't know what is the deflate level) and put it
to Avro and it became 38G
Why Avro is so big in size? Am I missing some size optimization?

Thanks in advance!