Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> avro_value_t or avro_datum_t


Copy link to this message
-
Re: avro_value_t or avro_datum_t
> //Assume path variable to be having proper value and proper exception
> handling in place
>
> PART A:
> avro_value_t data;
> avro_file_reader_t fileReader;
>  
> result = avro_file_reader(path, &fileReader);
> result = avro_file_reader_read_value(fileReader, &data);
>
> This above call leads to "segmentation fault"

> Am I missing anything in part A?

With the old datum API, avro_file_reader_read would allocate a new datum
instance for each record read from the file.  The new value API doesn't
allocate anything for you, so that if you're reading millions of records
from a file, you don't incur malloc/free overhead for each one of those
records.  That means that you have to allocate a value instance that
avro_file_reader_read_value can read into:

    avro_file_reader_t  reader;
    avro_schema_t  file_schema;
    avro_value_iface_t  *file_iface;
    avro_value_t  data;

    // Open the file and create an avro_value_t to read into.
    avro_file_reader(path, &reader);
    file_schema = avro_file_reader_get_writer_schema(reader);
    file_iface = avro_generic_class_from_schema(file_schema);
    avro_generic_value_new(file_iface, &data);

    // Read two records from the file.
    result = avro_file_reader_read_value(reader, &data);
    result = avro_file_reader_read_value(reader, &data);

Note that we're grabbing the writer schema from the file that we just
opened, so that we know that "data" is always an instance of the right
schema type.  Also note that when we read multiple records from the
file, we can reuse the "data" value instance.  Its contents will be
overwritten with each successive record from the file.