Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Schema evolution and projection


Copy link to this message
-
Re: Schema evolution and projection
Martin,

Yes, I had declared the reader schema used for my evolution test to have a
default value and to be a union with null. Apologies for not including that
information in my earlier post.

It makes sense for my applications to receive a default value rather than a
null so in my extension to the example I have made the new field a union
with null but set a default of an integer value.

I thought that I should be able to use the same example code Douglas
Creager provided that demonstrates schema projection - because, if I
understand correctly, it is performing the necessary resolution whether for
projection or evolution.

So if I stick with the resolver-writer.c example and I declare a new schema
that has an extra field:

#define READER_SCHEMA_C \
    "{" \
    "  \"type\": \"record\"," \
    "  \"name\": \"test\"," \
    "  \"fields\": [" \
    "    { \"name\": \"a\", \"type\": \"int\" }," \
    "    { \"name\": \"b\", \"type\": \"int\" }," \
    "    { \"name\": \"c\", \"type\": [\"null\", \"int\"], \"default\": 42
}" \
    "  ]" \
    "}"

and then use it in the resolver-writer.c code:

    printf("Reading evolved data with schema resolution, showing new field
\"c\"...\n");
    read_with_schema_resolution(FILENAME, READER_SCHEMA_C, "c");

I get:

Reading evolved data with schema resolution, showing new field "c"...
Error: Reader field c doesn't appear in writer

I was under the impression that I should have received the default value of
42 for field 'c' for each item in the data file.

BTW, I had come across your blog post in my Avro research. I found it very
useful.

Regards,
Chris
On Sat, Mar 2, 2013 at 12:23 AM, Martin Kleppmann <[EMAIL PROTECTED]>wrote:

> Chris,
>
> If you want a field in your reader schema that is not present in your
> writer schema, you have to set a default value — otherwise the reader
> has no way of knowing how to fill in that Field_3! If no particular
> default value makes sense, a standard technique is to make the field
> type a union with null, and to make null the default value
> (effectively making the field optional).
>
> For example:
>
> const char  EXTENDED_SCHEMA[] > "{\"type\":\"record\",\
>   \"name\":\"SimpleScehma\",\
>   \"fields\":[\
>      {\"name\": \"Field_1\", \"type\": \"int\"},\
>      {\"name\": \"Field_2\", \"type\": \"int\"},\
>      {\"name\": \"Field_3\", \"type\": [\"null\", \"int\"],
> \"default\": null}]}";
>
> To build your intuitive understanding of how schema evolution works,
> you might find this post useful:
>
> http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>
> Best,
> Martin
>
> On 1 March 2013 01:50, Chris Laws <[EMAIL PROTECTED]> wrote:
> > Doug,
> >
> > I have updated my test code in line with your excellent example and I now
> > have the projection aspect working well.
> >
> > Now... I'm stuck on a schema evolution test. Basically if I use your
> example
> > as the foundation and I create a new schema based on the WRITER_SCHEMA in
> > which I add a new field to the end (to model schema evolution) I receive
> an
> > error when trying to create the writer_iface.
> >
> > writer_iface = avro_resolved_writer_new(writer_schema, reader_schema);
> >
> > "Reader field Field_3 doesn't appear in writer"
> >
> > Any chance you could extending your example to show the ability of Avro
> to
> > read data from a data file using an evolved schema (say in a simple
> > situation were a new field is added to the schema)?
> >
> > Regards,
> > Chris
> >
> >
> >
> > On Fri, Mar 1, 2013 at 9:08 AM, Douglas Creager <[EMAIL PROTECTED]
> >
> > wrote:
> >>
> >> > Thanks for the informative reply. I look forward to the example code,
> >> > that is exactly what I'm after.
> >> >
> >> > I'm really struggling with my schema evolution testing. I thought I'd
> >> > post a question about schema projection because it seemed simpler but
> I
> >> > guess it also rests on creating a resolver. I have not found a clear
> and
> >> > simple example of how to do it using avro-c. I've trawled the test
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB