Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> C implementation - wrong number of records


Copy link to this message
-
Re: C implementation - wrong number of records
> I have the following C code - https://gist.github.com/967968
> When I ran it on a 100000 records file, it says 100030. (Both C and
> Python implementation count 10000).
>
> What am I doing wrong?

You found a bug in the C library's file reader code; I've opened up a bug report for it:

https://issues.apache.org/jira/browse/AVRO-819

The problem is that the file reader code isn't propagating errors correctly up through the call stack; which makes avro_file_reader_read not detect EOF; which makes you loop through the final block of the file twice.  That's where the extra 30 records in your count comes from — in the file you're reading, the final block must contain 30 records.

I've got a patch ready for this; I'll test on a couple of platforms and then commit it to Subversion.

cheers
–doug
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB