Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Reading an avro data file with avro c


Copy link to this message
-
Re: Reading an avro data file with avro c
Hi again,

Found the reason for this. When the file is written with the method as described below (in my earlier mail), the avro data file has multiple data blocks with block count 1. Now when the file is read, EOF is checked after each sync block with.

int avro_reader_is_eof(avro_reader_t reader)
{
if (is_file_io(reader)) {
return feof(avro_reader_to_file(reader)->fp);
}
return 0;
}
However at this point the whole file is already read to memory (but all the bytes have not yet been consumed), so feof returns non zero value.
There is a really easy fix for this though, by changing the EOF check function to

int avro_reader_is_eof(avro_reader_t reader)
{
if (is_file_io(reader)) {
         struct _avro_reader_file_t *file_reader = avro_reader_to_file(reader);
         if (feof(file_reader->fp)) {
           return file_reader->end == file_reader->cur;
         }
}
return 0;
}

How should I go forward to commit this change to the avro repo. I found an old issue (https://issues.apache.org/jira/browse/AVRO-1238) where some improvements to EOF handling has been done. Should I reopen that, or create a new one? If I do the change, is it enough to add the patch to the Jira ticket?

-Mika

On Oct 16, 2013, at 6:49 PM, Mika Ristimaki <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I encountered a possible bug in the Avro C API. If the following is done, it seems that the Avro data file reader can not read the file correctly
>
> while (has values to write) {
> Open file for writing
> Write a value to the file
> Close the writer.
> }
>
> However, the following can be read just fine
>
> Open file for writing
> while (has values to write) {
> Write a value to the file
> }
> Close the file
>
> Here it is assumed that reading and writing is done with the C API. The Java API can read data files written in C in both ways.
>
> Is this expected behaviour, a bug or am I just missing something? See the attached C program that reproduces this problem.
>
> Thanks
> -Mika
>
>
> <main.c>