Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Schema evolution and projection


Copy link to this message
-
Re: Schema evolution and projection
Martin,

Yes, I had declared the reader schema used for my evolution test to have a
default value and to be a union with null. Apologies for not including that
information in my earlier post.

It makes sense for my applications to receive a default value rather than a
null so in my extension to the example I have made the new field a union
with null but set a default of an integer value.

I thought that I should be able to use the same example code Douglas
Creager provided that demonstrates schema projection - because, if I
understand correctly, it is performing the necessary resolution whether for
projection or evolution.

So if I stick with the resolver-writer.c example and I declare a new schema
that has an extra field:

#define READER_SCHEMA_C \
    "{" \
    "  \"type\": \"record\"," \
    "  \"name\": \"test\"," \
    "  \"fields\": [" \
    "    { \"name\": \"a\", \"type\": \"int\" }," \
    "    { \"name\": \"b\", \"type\": \"int\" }," \
    "    { \"name\": \"c\", \"type\": [\"null\", \"int\"], \"default\": 42
}" \
    "  ]" \
    "}"

and then use it in the resolver-writer.c code:

    printf("Reading evolved data with schema resolution, showing new field
\"c\"...\n");
    read_with_schema_resolution(FILENAME, READER_SCHEMA_C, "c");

I get:

Reading evolved data with schema resolution, showing new field "c"...
Error: Reader field c doesn't appear in writer

I was under the impression that I should have received the default value of
42 for field 'c' for each item in the data file.

BTW, I had come across your blog post in my Avro research. I found it very
useful.

Regards,
Chris
On Sat, Mar 2, 2013 at 12:23 AM, Martin Kleppmann <[EMAIL PROTECTED]>wrote:

> Chris,
>
> If you want a field in your reader schema that is not present in your
> writer schema, you have to set a default value — otherwise the reader
> has no way of knowing how to fill in that Field_3! If no particular
> default value makes sense, a standard technique is to make the field
> type a union with null, and to make null the default value
> (effectively making the field optional).
>
> For example:
>
> const char  EXTENDED_SCHEMA[] > "{\"type\":\"record\",\
>   \"name\":\"SimpleScehma\",\
>   \"fields\":[\
>      {\"name\": \"Field_1\", \"type\": \"int\"},\
>      {\"name\": \"Field_2\", \"type\": \"int\"},\
>      {\"name\": \"Field_3\", \"type\": [\"null\", \"int\"],
> \"default\": null}]}";
>
> To build your intuitive understanding of how schema evolution works,
> you might find this post useful:
>
> http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>
> Best,
> Martin
>
> On 1 March 2013 01:50, Chris Laws <[EMAIL PROTECTED]> wrote:
> > Doug,
> >
> > I have updated my test code in line with your excellent example and I now
> > have the projection aspect working well.
> >
> > Now... I'm stuck on a schema evolution test. Basically if I use your
> example
> > as the foundation and I create a new schema based on the WRITER_SCHEMA in
> > which I add a new field to the end (to model schema evolution) I receive
> an
> > error when trying to create the writer_iface.
> >
> > writer_iface = avro_resolved_writer_new(writer_schema, reader_schema);
> >
> > "Reader field Field_3 doesn't appear in writer"
> >
> > Any chance you could extending your example to show the ability of Avro
> to
> > read data from a data file using an evolved schema (say in a simple
> > situation were a new field is added to the schema)?
> >
> > Regards,
> > Chris
> >
> >
> >
> > On Fri, Mar 1, 2013 at 9:08 AM, Douglas Creager <[EMAIL PROTECTED]
> >
> > wrote:
> >>
> >> > Thanks for the informative reply. I look forward to the example code,
> >> > that is exactly what I'm after.
> >> >
> >> > I'm really struggling with my schema evolution testing. I thought I'd
> >> > post a question about schema projection because it seemed simpler but
> I
> >> > guess it also rests on creating a resolver. I have not found a clear
> and
> >> > simple example of how to do it using avro-c. I've trawled the test