Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - static schema validation


Copy link to this message
-
Re: static schema validation
Doug Cutting 2013-01-31, 18:39
Aaron,

You can use the SchemaNormalization class to test if two schemas are
effectively identical:

http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas
http://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaNormalization.html

AVRO-816 has code to tell whether one Schema subsumes another (i.e.,
can, with resolution, read the other) and to combine multiple schemas
into a single that subsumes them all.

https://issues.apache.org/jira/browse/AVRO-816

Bob Cotton recently suggested that we should commit some form of this.
 I'd be happy to do this if others agree.

Doug

On Wed, Jan 30, 2013 at 3:17 PM, Aaron Kimball <[EMAIL PROTECTED]> wrote:
> Does Avro have an API to allow you to tell whether two schemas are a match,
> statically?
>
> i.e., schema1.canRead(schema2) /** return true iff schema1 can be used as a
> reader schema for schema2 */
>
> From my (admittedly cursorary) scan of the docs + source, it seems like
> there isn't something quite that concise, though maybe this can be
> accomplished using ResolvingGrammarGenerator?
>
> I'm pessimistic because of the following quote from the spec [1]
>
> [matching] if both are unions:
> The first schema in the reader's union that matches the selected writer's
> union schema is recursively resolved against it. if none match, an error is
> signalled.
>
> That sentence makes me think it's context dependent; I interpret "the
> selected writer's union schema" as "the schema of the actual thing written
> in a data buffer, which is one of the possible schemas the writer declared
> in her union type". i.e., you can only tell if schema R can be a reader for
> some other schema W in terms of a literal record written by W, and cannot be
> deduced statically for all possible records that can be encoded with schema
> W.  Is this interpretation correct? If so, does anyone have any ideas how to
> ensure the best bounds on statically-guaranteed backward compatibility
> between a given reader and writer?
>
> Thanks,
> - Aaron
>
> [1] http://avro.apache.org/docs/current/spec.html#Schema+Resolution