Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> static schema validation


+
Aaron Kimball 2013-01-30, 23:17
Copy link to this message
-
Re: static schema validation
Aaron,

You can use the SchemaNormalization class to test if two schemas are
effectively identical:

http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas
http://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaNormalization.html

AVRO-816 has code to tell whether one Schema subsumes another (i.e.,
can, with resolution, read the other) and to combine multiple schemas
into a single that subsumes them all.

https://issues.apache.org/jira/browse/AVRO-816

Bob Cotton recently suggested that we should commit some form of this.
 I'd be happy to do this if others agree.

Doug

On Wed, Jan 30, 2013 at 3:17 PM, Aaron Kimball <[EMAIL PROTECTED]> wrote:
> Does Avro have an API to allow you to tell whether two schemas are a match,
> statically?
>
> i.e., schema1.canRead(schema2) /** return true iff schema1 can be used as a
> reader schema for schema2 */
>
> From my (admittedly cursorary) scan of the docs + source, it seems like
> there isn't something quite that concise, though maybe this can be
> accomplished using ResolvingGrammarGenerator?
>
> I'm pessimistic because of the following quote from the spec [1]
>
> [matching] if both are unions:
> The first schema in the reader's union that matches the selected writer's
> union schema is recursively resolved against it. if none match, an error is
> signalled.
>
> That sentence makes me think it's context dependent; I interpret "the
> selected writer's union schema" as "the schema of the actual thing written
> in a data buffer, which is one of the possible schemas the writer declared
> in her union type". i.e., you can only tell if schema R can be a reader for
> some other schema W in terms of a literal record written by W, and cannot be
> deduced statically for all possible records that can be encoded with schema
> W.  Is this interpretation correct? If so, does anyone have any ideas how to
> ensure the best bounds on statically-guaranteed backward compatibility
> between a given reader and writer?
>
> Thanks,
> - Aaron
>
> [1] http://avro.apache.org/docs/current/spec.html#Schema+Resolution
+
Aaron Kimball 2013-02-01, 06:17
+
Aaron Kimball 2013-02-01, 21:42
+
Doug Cutting 2013-02-04, 22:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB