Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Schema evolution and projection


Copy link to this message
-
Schema evolution and projection
Hi,

I am struggling to familiarise myself with schema evolution and schema
projection using the avro-c implementation.

There doesn't seem to be much information available on how to perform these
tasks. The examples on the C API page confusingly mix the old datum API
with the new value API.

I have built what I think is a really simple example of testing schema
projection but it does not work the way I think it should work - more than
likely my understanding is wrong.

Where I ask for one particular field (by specifying the field name) of a
record to be retrieved I instead get every field that matches the request
type.

The attached file projection_01.c (attached and at
https://gist.github.com/claws/5056626) defines a really simple record with
If I avrocat the container file I see:
{"Field_1": 1, "Field_2": 1}
{"Field_1": 2, "Field_2": 2}
{"Field_1": 3, "Field_2": 3}
{"Field_1": 4, "Field_2": 4}
{"Field_1": 5, "Field_2": 5}

The projection schema being used is a record only containing Field_2 of
type int. I only expected that field to be returned by the reader yet I
receive every int type field, confusingly labelled as "Field_2".

When I run projection_01.c I see:
{"Field_2": 1}
{"Field_2": 1}
{"Field_2": 2}
{"Field_2": 2}
{"Field_2": 3}
{"Field_2": 3}
{"Field_2": 4}
{"Field_2": 4}
{"Field_2": 5}
{"Field_2": 5}

Is this how schema projection is supposed to work? Does it just return
items of the same type irrespective of the field name specified?

I think I am missing something about how this is supposed to work. Perhaps
my example record is too simple.

So, I then created a slightly more complex schema that contained a
sub-record and the projection seems to work how I think it should work.
This can be seen in the output from projection_02.c (attached and at
https://gist.github.com/claws/5056643) which returns:
{"Field_2": {"SubField_1": 1, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}
{"Field_2": {"SubField_1": 2, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}
{"Field_2": {"SubField_1": 3, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}
{"Field_2": {"SubField_1": 4, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}
{"Field_2": {"SubField_1": 5, "SubField_2": 42}}
{"Field_2": {"SubField_1": 24, "SubField_2": 3}}

>From this trial and error it appears that the projection will return me
values that match the projection schema's types - but does not take into
account any 'name' fields. Would that be an accurate assessment?

Can anyone provide some more information on schema projection?
Is there a good example anywhere?

Regards,
Chris
+
Doug Cutting 2013-02-28, 17:53
+
Douglas Creager 2013-02-28, 21:01
+
Chris Laws 2013-02-28, 21:13
+
Douglas Creager 2013-02-28, 22:38
+
Chris Laws 2013-03-01, 01:50
+
Martin Kleppmann 2013-03-01, 13:53
+
Chris Laws 2013-03-01, 22:26
+
Douglas Creager 2013-03-01, 23:34
+
Chris Laws 2013-03-02, 02:05
+
Doug Cutting 2013-03-01, 22:47
+
Doug Cutting 2013-03-01, 01:30
+
Chris Laws 2013-03-01, 01:41
+
Chris Laws 2013-03-01, 01:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB