Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> sync interval for AvroOutputFormat


Copy link to this message
-
sync interval for AvroOutputFormat
AvroOutputFormat supports setting deflate level, but not the sync interval.
 Was this a conscious decision (i.e. would there be drawbacks of making the
sync interval larger)?

In some tests that I've done, Avro data files were over 50% smaller when I
upped the sync interval to 2MB (default is 16000 bytes).  I also saw a
modest speedup in building the files (I suspect my program was IO-bound).

Would folks support a patch to add setting a sync interval as a static
configuration option to AvroOutputFormat?

Best,
Joe
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB