Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Projection using C value API

Copy link to this message
RE: Projection using C value API
Vivek Nadkarni 2011-12-21, 08:02
Hi Alan -

I recently provided some test code that shows an example of resolving reader and writer schemas.  

See the file https://issues.apache.org/jira/secure/attachment/12508142/avro-984-test-v2.c  in https://issues.apache.org/jira/browse/AVRO-984 .

Note that in my example I resolve between identical reader and writer schemas, because I load the same schema (NESTED_ARRAY) in schema_old and schema_new in the function init_schema().

However, the same example applies if you have different schemas - SCHEMA_OLD and SCHEMA_NEW. Instead of loading instead of loading NESTED_ARRAY, use
avro_schema_from_json( SCHEMA_OLD, sizeof(SCHEMA_OLD), &schema_old, error )
avro_schema_from_json( SCHEMA_NEW, sizeof(SCHEMA_NEW), &schema_new, error )

You should be able to read the record if the reader schema is a subset of the writer schema.

As an aside, AVRO-C currently does not support the situation, in which, the reader schema is a superset of the writer schema, because schema default reader values have not been implemented in AVRO-C.


-----Original Message-----
Sent: Tuesday, December 20, 2011 9:53 PM
Subject: Projection using C value API

(Apologies, resending with subject line)

With the legacy datum API one could create a read record schema with a subset of the fields in the write record schema. And then call avro_file_reader_read which takes the read schema as one of the parameters. That call would then correctly read in  just the subset of the fields as defined in the read schema. But I can't seem to do the same with the new API.

My read code does:

iface = avro_generic_class_from_schema(read_schema);
avro_generic_value_new(iface, &avro_record); avro_file_reader(filename, &reader); avro_file_reader_read_value(reader, &avro_record);

If the read schema is the same as the write schema then all is well. But if the read schema has only a subset of the fields in the write schema avro_file_reader_read_value gets an error with error string "Incorrect sync bytes".

Am I doing something wrong here? If so, what is the right way to achieve such a projection read?