Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> static schema validation


Copy link to this message
-
Re: static schema validation
That sounds like what I'm looking for. I'll take a look!

Thanks,
- Aaron
On Jan 31, 2013 10:39 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> Aaron,
>
> You can use the SchemaNormalization class to test if two schemas are
> effectively identical:
>
>
> http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaNormalization.html
>
> AVRO-816 has code to tell whether one Schema subsumes another (i.e.,
> can, with resolution, read the other) and to combine multiple schemas
> into a single that subsumes them all.
>
> https://issues.apache.org/jira/browse/AVRO-816
>
> Bob Cotton recently suggested that we should commit some form of this.
>  I'd be happy to do this if others agree.
>
> Doug
>
> On Wed, Jan 30, 2013 at 3:17 PM, Aaron Kimball <[EMAIL PROTECTED]>
> wrote:
> > Does Avro have an API to allow you to tell whether two schemas are a
> match,
> > statically?
> >
> > i.e., schema1.canRead(schema2) /** return true iff schema1 can be used
> as a
> > reader schema for schema2 */
> >
> > From my (admittedly cursorary) scan of the docs + source, it seems like
> > there isn't something quite that concise, though maybe this can be
> > accomplished using ResolvingGrammarGenerator?
> >
> > I'm pessimistic because of the following quote from the spec [1]
> >
> > [matching] if both are unions:
> > The first schema in the reader's union that matches the selected writer's
> > union schema is recursively resolved against it. if none match, an error
> is
> > signalled.
> >
> > That sentence makes me think it's context dependent; I interpret "the
> > selected writer's union schema" as "the schema of the actual thing
> written
> > in a data buffer, which is one of the possible schemas the writer
> declared
> > in her union type". i.e., you can only tell if schema R can be a reader
> for
> > some other schema W in terms of a literal record written by W, and
> cannot be
> > deduced statically for all possible records that can be encoded with
> schema
> > W.  Is this interpretation correct? If so, does anyone have any ideas
> how to
> > ensure the best bounds on statically-guaranteed backward compatibility
> > between a given reader and writer?
> >
> > Thanks,
> > - Aaron
> >
> > [1] http://avro.apache.org/docs/current/spec.html#Schema+Resolution
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB