Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Schema evolution and projection


Copy link to this message
-
Re: Schema evolution and projection
Thanks for the informative reply. I look forward to the example code, that
is exactly what I'm after.

I'm really struggling with my schema evolution testing. I thought I'd post
a question about schema projection because it seemed simpler but I guess it
also rests on creating a resolver. I have not found a clear and simple
example of how to do it using avro-c. I've trawled the test code for
examples but as I mention I can't find a clear and simple example.

I realise that the majority of Avro usage appears to be in Java however I
need to use Avro-c for my assessment of Avro because a large portion of our
system uses C.

Thanks for your help.
Chris
On Fri, Mar 1, 2013 at 7:31 AM, Douglas Creager <[EMAIL PROTECTED]>wrote:

> > There doesn't seem to be much information available on how to perform
> > these tasks. The examples on the C API page confusingly mix the old
> > datum API with the new value API.
>
> Apologies for that — you're absolutely right that we need to clean up
> the C API documentation a bit.
>
> > Is this how schema projection is supposed to work? Does it just return
> > items of the same type irrespective of the field name specified?
>
> tl;dr — The schema projection doesn't happen for free; you need to use a
> "resolved writer" to perform the schema resolution.
>
> In the C API, when you open an Avro file for reading, we expect that the
> avro_value_t that you pass in to avro_file_reader_read_value has the
> *exact same* schema that was used to create the file.  So in your first
> example (gist 5056626), your read_archive_test function works great
> since it's explicitly asking the file for the writer schema, and using
> that to create the value instance to read into.  If you know that you
> want to read exactly what's in the file, not perform any schema
> resolution, and (optionally) dynamically interrogate the writer schema
> to see what fields are available, this is exactly the right approach.
>
> On the other hand, if you want to use schema resolution to project away
> some of the fields (or to do other interesting data conversions), you
> need to create a resolved writer to perform that schema resolution.  The
> resolved writer is an avro_value_iface_t that wraps up the schema
> resolution rules for a particular writer schema and reader schema.  When
> you create an avro_value_t instance of the resolved writer, it looks
> like it's an instance of the writer schema, and it wraps an instance of
> the reader schema.  Since the resolved writer value is an instance of
> the writer schema, you can read data into it using
> avro_file_reader_read_value.  Under the covers, it will perform the
> schema resolution and fill in the wrapped reader schema instance.  You
> can then read the projected data out of your reader value.
>
> In English that's probably still a bit too dense of an explanation; I'll
> whip together an example program and post it as a gist so that you can
> see it in actual code.
>
> (As an aside, the reason original projection_test worked the way that it
> did is because a single "record { int, int }" value happens to have the
> same serialization as two consecutive "int" values.
> avro_file_reader_read_value doesn't do any schema resolution, it just
> tries to read a value of the type that you pass in.)
>
> cheers
> –doug
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB