Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Schema evolution and projection


+
Chris Laws 2013-02-28, 13:21
Copy link to this message
-
Re: Schema evolution and projection
I'm not familiar with the C implementation, but it should follow the
resolution rules from the specification:

http://avro.apache.org/docs/current/spec.html#Schema+Resolution

We call it "projection" when schema resolution is used with a subset
schema as the reader's schema.  A subset is created by removing fields
from the writer's schema that are not required.

Does that help?

Doug

On Thu, Feb 28, 2013 at 5:21 AM, Chris Laws <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am struggling to familiarise myself with schema evolution and schema
> projection using the avro-c implementation.
>
> There doesn't seem to be much information available on how to perform these
> tasks. The examples on the C API page confusingly mix the old datum API with
> the new value API.
>
> I have built what I think is a really simple example of testing schema
> projection but it does not work the way I think it should work - more than
> likely my understanding is wrong.
>
> Where I ask for one particular field (by specifying the field name) of a
> record to be retrieved I instead get every field that matches the request
> type.
>
> The attached file projection_01.c (attached and at
> https://gist.github.com/claws/5056626) defines a really simple record with
> If I avrocat the container file I see:
> {"Field_1": 1, "Field_2": 1}
> {"Field_1": 2, "Field_2": 2}
> {"Field_1": 3, "Field_2": 3}
> {"Field_1": 4, "Field_2": 4}
> {"Field_1": 5, "Field_2": 5}
>
> The projection schema being used is a record only containing Field_2 of type
> int. I only expected that field to be returned by the reader yet I receive
> every int type field, confusingly labelled as "Field_2".
>
> When I run projection_01.c I see:
> {"Field_2": 1}
> {"Field_2": 1}
> {"Field_2": 2}
> {"Field_2": 2}
> {"Field_2": 3}
> {"Field_2": 3}
> {"Field_2": 4}
> {"Field_2": 4}
> {"Field_2": 5}
> {"Field_2": 5}
>
> Is this how schema projection is supposed to work? Does it just return items
> of the same type irrespective of the field name specified?
>
> I think I am missing something about how this is supposed to work. Perhaps
> my example record is too simple.
>
> So, I then created a slightly more complex schema that contained a
> sub-record and the projection seems to work how I think it should work. This
> can be seen in the output from projection_02.c (attached and at
> https://gist.github.com/claws/5056643) which returns:
> {"Field_2": {"SubField_1": 1, "SubField_2": 42}}
> {"Field_2": {"SubField_1": 24, "SubField_2": 3}}
> {"Field_2": {"SubField_1": 2, "SubField_2": 42}}
> {"Field_2": {"SubField_1": 24, "SubField_2": 3}}
> {"Field_2": {"SubField_1": 3, "SubField_2": 42}}
> {"Field_2": {"SubField_1": 24, "SubField_2": 3}}
> {"Field_2": {"SubField_1": 4, "SubField_2": 42}}
> {"Field_2": {"SubField_1": 24, "SubField_2": 3}}
> {"Field_2": {"SubField_1": 5, "SubField_2": 42}}
> {"Field_2": {"SubField_1": 24, "SubField_2": 3}}
>
> From this trial and error it appears that the projection will return me
> values that match the projection schema's types - but does not take into
> account any 'name' fields. Would that be an accurate assessment?
>
> Can anyone provide some more information on schema projection?
> Is there a good example anywhere?
>
> Regards,
> Chris
>
>
>
>
+
Douglas Creager 2013-02-28, 21:01
+
Chris Laws 2013-02-28, 21:13
+
Douglas Creager 2013-02-28, 22:38
+
Chris Laws 2013-03-01, 01:50
+
Martin Kleppmann 2013-03-01, 13:53
+
Chris Laws 2013-03-01, 22:26
+
Douglas Creager 2013-03-01, 23:34
+
Chris Laws 2013-03-02, 02:05
+
Doug Cutting 2013-03-01, 22:47
+
Doug Cutting 2013-03-01, 01:30
+
Chris Laws 2013-03-01, 01:41
+
Chris Laws 2013-03-01, 01:14