Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Schema evolution and projection


Copy link to this message
-
Schema evolution and projection
Hi,

I am struggling to familiarise myself with schema evolution and schema
projection using the avro-c implementation.

There doesn't seem to be much information available on how to perform these
tasks. The examples on the C API page confusingly mix the old datum API
with the new value API.

I have built what I think is a really simple example of testing schema
projection but it does not work the way I think it should work - more than
likely my understanding is wrong.

Where I ask for one particular field (by specifying the field name) of a
record to be retrieved I instead get every field that matches the request
type.

The attached file projection_01.c (attached and at
https://gist.github.com/claws/5056626) defines a really simple record with
If I avrocat the container file I see:
{"Field_1": 1, "Field_2": 1}
{"Field_1": 2, "Field_2": 2}
{"Field_1": 3, "Field_2": 3}
{"Field_1": 4, "Field_2": 4}
{"Field_1": 5, "Field_2": 5}

The projection schema being used is a record only containing Field_2 of
type int. I only expected that field to be returned by the reader yet I
receive every int type field, confusingly labelled as "Field_2".

When I run projection_01.c I see:
{"Field_2": 1}
{"Field_2": 1}
{"Field_2": 2}
{"Field_2": 2}
{"Field_2": 3}
{"Field_2": 3}
{"Field_2": 4}
{"Field_2": 4}
{"Field_2": 5}
{"Field_2": 5}

Is this how schema projection is supposed to work? Does it just return
items of the same type irrespective of the field name specified?

I think I am missing something about how this is supposed to work. Perhaps
my example record is too simple.

So, I then created a slightly more complex schema that contained a
sub-record and the projection seems to work how I think it should work.
This can be seen in the output from projection_02.c (attached and at
https://gist.github.com/claws/5056643) which returns:
{"Field_2": {"SubField_1": 1, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}
{"Field_2": {"SubField_1": 2, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}
{"Field_2": {"SubField_1": 3, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}
{"Field_2": {"SubField_1": 4, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}
{"Field_2": {"SubField_1": 5, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}

>From this trial and error it appears that the projection will return me
values that match the projection schema's types - but does not take into
account any 'name' fields. Would that be an accurate assessment?

Can anyone provide some more information on schema projection?
Is there a good example anywhere?

Regards,
Chris
+
Doug Cutting 2013-02-28, 17:53
+
Douglas Creager 2013-02-28, 21:01
+
Chris Laws 2013-02-28, 21:13
+
Douglas Creager 2013-02-28, 22:38
+
Chris Laws 2013-03-01, 01:50
+
Martin Kleppmann 2013-03-01, 13:53
+
Chris Laws 2013-03-01, 22:26
+
Douglas Creager 2013-03-01, 23:34
+
Chris Laws 2013-03-02, 02:05
+
Doug Cutting 2013-03-01, 22:47
+
Doug Cutting 2013-03-01, 01:30
+
Chris Laws 2013-03-01, 01:41
+
Chris Laws 2013-03-01, 01:14