Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> static schema validation


+
Aaron Kimball 2013-01-30, 23:17
+
Doug Cutting 2013-01-31, 18:39
+
Aaron Kimball 2013-02-01, 06:17
+
Aaron Kimball 2013-02-01, 21:42
Copy link to this message
-
Re: static schema validation
I think AVRO-816 should help you.  Neither S1 nor S2 subsume one
another, but S3 subsumes them both.

Doug

On Fri, Feb 1, 2013 at 1:42 PM, Aaron Kimball <[EMAIL PROTECTED]> wrote:
> Ok, I read the patch and JIRA issue a bit more thoroughly. Schema
> normalization just tells you if two schemas differ only in the unimportant
> bits.
>
> As I understand it, subsumes() will tell you if a schema is a strict
> superset of another.
> i.e.,
> if S1 is a record of { a:int, b:string }, and S2 is a record of { a:int,
> b:string, c:int }, then S2.subsumes(S1) would return true but not vice
> versa. Is that correct?
>
> The functionality I need, is to guarantee that two writers who write to a
> common data store with possibly different schemas can still read one
> another's data without a deserialization error. They need to agree ahead of
> time that they're going to write data with schemas "close enough" that the
> other one can always deserialize the data into their preferred format.
>
> S1 and S2 above do not meet this criterion, because S2 cannot read record
> written with S1. It doesn't know how to instantiate field 'c'.
>
> However, S1 and S3 = { a:int, b:string, c:int default 0 } would meet my
> criterion.
>
> Does AVRO-816 help me answer this question?
> Thanks,
> - Aaron
>
>
>
> On Thu, Jan 31, 2013 at 10:17 PM, Aaron Kimball <[EMAIL PROTECTED]>
> wrote:
>>
>> That sounds like what I'm looking for. I'll take a look!
>>
>> Thanks,
>> - Aaron
>>
>> On Jan 31, 2013 10:39 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:
>>>
>>> Aaron,
>>>
>>> You can use the SchemaNormalization class to test if two schemas are
>>> effectively identical:
>>>
>>>
>>> http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas
>>>
>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaNormalization.html
>>>
>>> AVRO-816 has code to tell whether one Schema subsumes another (i.e.,
>>> can, with resolution, read the other) and to combine multiple schemas
>>> into a single that subsumes them all.
>>>
>>> https://issues.apache.org/jira/browse/AVRO-816
>>>
>>> Bob Cotton recently suggested that we should commit some form of this.
>>>  I'd be happy to do this if others agree.
>>>
>>> Doug
>>>
>>> On Wed, Jan 30, 2013 at 3:17 PM, Aaron Kimball <[EMAIL PROTECTED]>
>>> wrote:
>>> > Does Avro have an API to allow you to tell whether two schemas are a
>>> > match,
>>> > statically?
>>> >
>>> > i.e., schema1.canRead(schema2) /** return true iff schema1 can be used
>>> > as a
>>> > reader schema for schema2 */
>>> >
>>> > From my (admittedly cursorary) scan of the docs + source, it seems like
>>> > there isn't something quite that concise, though maybe this can be
>>> > accomplished using ResolvingGrammarGenerator?
>>> >
>>> > I'm pessimistic because of the following quote from the spec [1]
>>> >
>>> > [matching] if both are unions:
>>> > The first schema in the reader's union that matches the selected
>>> > writer's
>>> > union schema is recursively resolved against it. if none match, an
>>> > error is
>>> > signalled.
>>> >
>>> > That sentence makes me think it's context dependent; I interpret "the
>>> > selected writer's union schema" as "the schema of the actual thing
>>> > written
>>> > in a data buffer, which is one of the possible schemas the writer
>>> > declared
>>> > in her union type". i.e., you can only tell if schema R can be a reader
>>> > for
>>> > some other schema W in terms of a literal record written by W, and
>>> > cannot be
>>> > deduced statically for all possible records that can be encoded with
>>> > schema
>>> > W.  Is this interpretation correct? If so, does anyone have any ideas
>>> > how to
>>> > ensure the best bounds on statically-guaranteed backward compatibility
>>> > between a given reader and writer?
>>> >
>>> > Thanks,
>>> > - Aaron
>>> >
>>> > [1] http://avro.apache.org/docs/current/spec.html#Schema+Resolution
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB