Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> static schema validation


Copy link to this message
-
Re: static schema validation
I think AVRO-816 should help you.  Neither S1 nor S2 subsume one
another, but S3 subsumes them both.

Doug

On Fri, Feb 1, 2013 at 1:42 PM, Aaron Kimball <[EMAIL PROTECTED]> wrote:
> Ok, I read the patch and JIRA issue a bit more thoroughly. Schema
> normalization just tells you if two schemas differ only in the unimportant
> bits.
>
> As I understand it, subsumes() will tell you if a schema is a strict
> superset of another.
> i.e.,
> if S1 is a record of { a:int, b:string }, and S2 is a record of { a:int,
> b:string, c:int }, then S2.subsumes(S1) would return true but not vice
> versa. Is that correct?
>
> The functionality I need, is to guarantee that two writers who write to a
> common data store with possibly different schemas can still read one
> another's data without a deserialization error. They need to agree ahead of
> time that they're going to write data with schemas "close enough" that the
> other one can always deserialize the data into their preferred format.
>
> S1 and S2 above do not meet this criterion, because S2 cannot read record
> written with S1. It doesn't know how to instantiate field 'c'.
>
> However, S1 and S3 = { a:int, b:string, c:int default 0 } would meet my
> criterion.
>
> Does AVRO-816 help me answer this question?
> Thanks,
> - Aaron
>
>
>
> On Thu, Jan 31, 2013 at 10:17 PM, Aaron Kimball <[EMAIL PROTECTED]>
> wrote:
>>
>> That sounds like what I'm looking for. I'll take a look!
>>
>> Thanks,
>> - Aaron
>>
>> On Jan 31, 2013 10:39 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:
>>>
>>> Aaron,
>>>
>>> You can use the SchemaNormalization class to test if two schemas are
>>> effectively identical:
>>>
>>>
>>> http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas
>>>
>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaNormalization.html
>>>
>>> AVRO-816 has code to tell whether one Schema subsumes another (i.e.,
>>> can, with resolution, read the other) and to combine multiple schemas
>>> into a single that subsumes them all.
>>>
>>> https://issues.apache.org/jira/browse/AVRO-816
>>>
>>> Bob Cotton recently suggested that we should commit some form of this.
>>>  I'd be happy to do this if others agree.
>>>
>>> Doug
>>>
>>> On Wed, Jan 30, 2013 at 3:17 PM, Aaron Kimball <[EMAIL PROTECTED]>
>>> wrote:
>>> > Does Avro have an API to allow you to tell whether two schemas are a
>>> > match,
>>> > statically?
>>> >
>>> > i.e., schema1.canRead(schema2) /** return true iff schema1 can be used
>>> > as a
>>> > reader schema for schema2 */
>>> >
>>> > From my (admittedly cursorary) scan of the docs + source, it seems like
>>> > there isn't something quite that concise, though maybe this can be
>>> > accomplished using ResolvingGrammarGenerator?
>>> >
>>> > I'm pessimistic because of the following quote from the spec [1]
>>> >
>>> > [matching] if both are unions:
>>> > The first schema in the reader's union that matches the selected
>>> > writer's
>>> > union schema is recursively resolved against it. if none match, an
>>> > error is
>>> > signalled.
>>> >
>>> > That sentence makes me think it's context dependent; I interpret "the
>>> > selected writer's union schema" as "the schema of the actual thing
>>> > written
>>> > in a data buffer, which is one of the possible schemas the writer
>>> > declared
>>> > in her union type". i.e., you can only tell if schema R can be a reader
>>> > for
>>> > some other schema W in terms of a literal record written by W, and
>>> > cannot be
>>> > deduced statically for all possible records that can be encoded with
>>> > schema
>>> > W.  Is this interpretation correct? If so, does anyone have any ideas
>>> > how to
>>> > ensure the best bounds on statically-guaranteed backward compatibility
>>> > between a given reader and writer?
>>> >
>>> > Thanks,
>>> > - Aaron
>>> >
>>> > [1] http://avro.apache.org/docs/current/spec.html#Schema+Resolution
>
>