|
|
-
Re: Nested schema issueScott Carey 2012-05-01, 23:20
On 5/1/12 9:47 AM, "Peter Cameron" <[EMAIL PROTECTED]> wrote: >I'm having a problem with nesting schemas. A very brief overview of why >we're using Avro (successfully so far) is: > >o code generation not required >o small binary format >o dynamic use of schemas at runtime > >We're doing a flavour of RPC, and the reason we're not using Avro's IDL >and flavour of RPC is because the endpoint is not necessarily a Java >platform (C# and Java for our purposes), and only the Java >implementation of Avro has RPC. Hence no Avro RPC for us. > >I'm aware that Avro doesn't import nested schemas out of the box. We >need that functionality as we're exposed to schemas over which we have >no control, and in the interests of maintainability, these schemas are >nicely partitioned and are referenced as types from within other >schemas. So, for example, a address schema refers to a >some.domain.location object by having a field of type >"some.domain.location". Note that our runtime has no knowledge of any >some.domain package (e.g. address or location objects). Only the >endpoints know about some.domain. (A layer at our endpoint runtime >serialises any unknown i.e. non-primitive objects as bytestreams.) > >I implemented a schema cache which intelligently imports schemas on the >fly, so adding the address schema to the cache, automatically adds the >location schema that it refers to. The cache uses Avro's schema to parse >an added schema, catches parse exceptions, looks at the exception >message to see whether or not the error is due to a missing or undefined >type, and thus goes off to import the needed schema. Brittle, I know, >but no other way for us. We need this functionality, and nothing else >comes close to Avro. On the Java side, recent versions have a Parser that can deal with schema import. It requires that a schema be defined before use however. Perhaps we can add a callback to the API for returning undefined schemas as they are found. > >So far so good, until today when I hit a corner case. > >Say I have an address object that has two fields, called position1 and >position2. If position1 and position2 are non-primitive types, then the >address schema doesn't parse so presumably is an invalid Avro schema. >The error concerns redefining the location type. Here's the example: > >location schema >=============> >{ > "name": "location", > "type": "record", > "namespace" : "some.domain", > "fields" : > [ > { > "name": "latitude", > "type": "float" > }, > { > "name": "longitude", > "type": "float" > } > ] >} > >address schema >=============> >{ > "name": "address", > "type": "record", > "namespace" : "some.domain", > "fields" : > [ > { > "name": "street", > "type": "string" > }, > { > "name": "city", > "type": "string" > }, > { > "name": "position1", > "type": "some.domain.location" > }, > { > "name": "position2", > "type": "some.domain.location" > } > ] >} > > >Now, an answer of having a list of positions as a field is not an answer >for us, as we need to solve the general issue of a schema with more than >one instance of the same nested type i.e. my problem is not with an >address or location schema. > >Can this be done? This is potentially a blocker for us. This should be possible. A named type can be used for multiple differently named fields in a record. Is the parse error in C# or Java? What is the error? > >cheers, >Peter > |