Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Avro file size is too big


+
Ruslan Al-Fakikh 2012-07-04, 13:32
Copy link to this message
-
Re: Avro file size is too big
Russell Jurney 2012-07-04, 21:58
This thread looks useful. Are you flushing too often?
http://apache-avro.679487.n3.nabble.com/avro-compression-using-snappy-and-deflate-td3870167.html

Russell Jurney http://datasyndrome.com

On Jul 4, 2012, at 6:33 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote:

> Hello,
>
> In my organization currently we are evaluating Avro as a format. Our
> concern is file size. I've done some comparisons of a piece of our
> data.
> Say we have sequence files, compressed. The payload (values) are just
> lines. As far as I know we use line number as keys and we use the
> default codec for compression inside sequence files. The size is 1.6G,
> when I put it to avro with deflate codec with deflate level 9 it
> becomes 2.2G.
> This is interesting, because the values in seq files are just string,
> but Avro has a normal schema with primitive types. And those are kept
> binary. Shouldn't Avro be less in size?
> Also I took another dataset which is 28G (gzip files, plain
> tab-delimited text, don't know what is the deflate level) and put it
> to Avro and it became 38G
> Why Avro is so big in size? Am I missing some size optimization?
>
> Thanks in advance!
+
Ruslan Al-Fakikh 2012-07-05, 14:53
+
Doug Cutting 2012-07-05, 17:24
+
Ruslan Al-Fakikh 2012-07-05, 22:11
+
Doug Cutting 2012-07-05, 22:19
+
Ey-Chih chow 2012-07-18, 23:59
+
Harsh J 2012-07-20, 02:07
+
Ey-Chih chow 2012-07-20, 17:02
+
Ey-Chih chow 2012-07-20, 17:12
+
Doug Cutting 2012-07-20, 20:00
+
Ey-Chih chow 2012-07-20, 20:32