Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Schema evolution and projection


+
Chris Laws 2013-02-28, 13:21
+
Doug Cutting 2013-02-28, 17:53
+
Douglas Creager 2013-02-28, 21:01
Copy link to this message
-
Re: Schema evolution and projection
Chris Laws 2013-02-28, 21:13
Thanks for the informative reply. I look forward to the example code, that
is exactly what I'm after.

I'm really struggling with my schema evolution testing. I thought I'd post
a question about schema projection because it seemed simpler but I guess it
also rests on creating a resolver. I have not found a clear and simple
example of how to do it using avro-c. I've trawled the test code for
examples but as I mention I can't find a clear and simple example.

I realise that the majority of Avro usage appears to be in Java however I
need to use Avro-c for my assessment of Avro because a large portion of our
system uses C.

Thanks for your help.
Chris
On Fri, Mar 1, 2013 at 7:31 AM, Douglas Creager <[EMAIL PROTECTED]>wrote:

> > There doesn't seem to be much information available on how to perform
> > these tasks. The examples on the C API page confusingly mix the old
> > datum API with the new value API.
>
> Apologies for that — you're absolutely right that we need to clean up
> the C API documentation a bit.
>
> > Is this how schema projection is supposed to work? Does it just return
> > items of the same type irrespective of the field name specified?
>
> tl;dr — The schema projection doesn't happen for free; you need to use a
> "resolved writer" to perform the schema resolution.
>
> In the C API, when you open an Avro file for reading, we expect that the
> avro_value_t that you pass in to avro_file_reader_read_value has the
> *exact same* schema that was used to create the file.  So in your first
> example (gist 5056626), your read_archive_test function works great
> since it's explicitly asking the file for the writer schema, and using
> that to create the value instance to read into.  If you know that you
> want to read exactly what's in the file, not perform any schema
> resolution, and (optionally) dynamically interrogate the writer schema
> to see what fields are available, this is exactly the right approach.
>
> On the other hand, if you want to use schema resolution to project away
> some of the fields (or to do other interesting data conversions), you
> need to create a resolved writer to perform that schema resolution.  The
> resolved writer is an avro_value_iface_t that wraps up the schema
> resolution rules for a particular writer schema and reader schema.  When
> you create an avro_value_t instance of the resolved writer, it looks
> like it's an instance of the writer schema, and it wraps an instance of
> the reader schema.  Since the resolved writer value is an instance of
> the writer schema, you can read data into it using
> avro_file_reader_read_value.  Under the covers, it will perform the
> schema resolution and fill in the wrapped reader schema instance.  You
> can then read the projected data out of your reader value.
>
> In English that's probably still a bit too dense of an explanation; I'll
> whip together an example program and post it as a gist so that you can
> see it in actual code.
>
> (As an aside, the reason original projection_test worked the way that it
> did is because a single "record { int, int }" value happens to have the
> same serialization as two consecutive "int" values.
> avro_file_reader_read_value doesn't do any schema resolution, it just
> tries to read a value of the type that you pass in.)
>
> cheers
> –doug
>
>
+
Douglas Creager 2013-02-28, 22:38
+
Chris Laws 2013-03-01, 01:50
+
Martin Kleppmann 2013-03-01, 13:53
+
Chris Laws 2013-03-01, 22:26
+
Douglas Creager 2013-03-01, 23:34
+
Chris Laws 2013-03-02, 02:05
+
Doug Cutting 2013-03-01, 22:47
+
Doug Cutting 2013-03-01, 01:30
+
Chris Laws 2013-03-01, 01:41
+
Chris Laws 2013-03-01, 01:14