Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> static schema validation


Copy link to this message
-
Re: static schema validation
That sounds like what I'm looking for. I'll take a look!

Thanks,
- Aaron
On Jan 31, 2013 10:39 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> Aaron,
>
> You can use the SchemaNormalization class to test if two schemas are
> effectively identical:
>
>
> http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaNormalization.html
>
> AVRO-816 has code to tell whether one Schema subsumes another (i.e.,
> can, with resolution, read the other) and to combine multiple schemas
> into a single that subsumes them all.
>
> https://issues.apache.org/jira/browse/AVRO-816
>
> Bob Cotton recently suggested that we should commit some form of this.
>  I'd be happy to do this if others agree.
>
> Doug
>
> On Wed, Jan 30, 2013 at 3:17 PM, Aaron Kimball <[EMAIL PROTECTED]>
> wrote:
> > Does Avro have an API to allow you to tell whether two schemas are a
> match,
> > statically?
> >
> > i.e., schema1.canRead(schema2) /** return true iff schema1 can be used
> as a
> > reader schema for schema2 */
> >
> > From my (admittedly cursorary) scan of the docs + source, it seems like
> > there isn't something quite that concise, though maybe this can be
> > accomplished using ResolvingGrammarGenerator?
> >
> > I'm pessimistic because of the following quote from the spec [1]
> >
> > [matching] if both are unions:
> > The first schema in the reader's union that matches the selected writer's
> > union schema is recursively resolved against it. if none match, an error
> is
> > signalled.
> >
> > That sentence makes me think it's context dependent; I interpret "the
> > selected writer's union schema" as "the schema of the actual thing
> written
> > in a data buffer, which is one of the possible schemas the writer
> declared
> > in her union type". i.e., you can only tell if schema R can be a reader
> for
> > some other schema W in terms of a literal record written by W, and
> cannot be
> > deduced statically for all possible records that can be encoded with
> schema
> > W.  Is this interpretation correct? If so, does anyone have any ideas
> how to
> > ensure the best bounds on statically-guaranteed backward compatibility
> > between a given reader and writer?
> >
> > Thanks,
> > - Aaron
> >
> > [1] http://avro.apache.org/docs/current/spec.html#Schema+Resolution
>