Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Nested schema issue


+
Peter Cameron 2012-05-01, 16:47
Copy link to this message
-
Re: Nested schema issue


On 5/1/12 9:47 AM, "Peter Cameron" <[EMAIL PROTECTED]> wrote:

>I'm having a problem with nesting schemas. A very brief overview of why
>we're using Avro (successfully so far) is:
>
>o code generation not required
>o small binary format
>o dynamic use of schemas at runtime
>
>We're doing a flavour of RPC, and the reason we're not using Avro's IDL
>and flavour of RPC is because the endpoint is not necessarily a Java
>platform (C# and Java for our purposes), and only the Java
>implementation of Avro has RPC. Hence no Avro RPC for us.
>
>I'm aware that Avro doesn't import nested schemas out of the box. We
>need that functionality as we're exposed to schemas over which we have
>no control, and in the interests of maintainability, these schemas are
>nicely partitioned and are referenced as types from within other
>schemas. So, for example, a address schema refers to a
>some.domain.location object by having a field of type
>"some.domain.location". Note that our runtime has no knowledge of any
>some.domain package (e.g. address or location objects). Only the
>endpoints know about some.domain. (A layer at our endpoint runtime
>serialises any unknown i.e. non-primitive objects as bytestreams.)
>
>I implemented a schema cache which intelligently imports schemas on the
>fly, so adding the address schema to the cache, automatically adds the
>location schema that it refers to. The cache uses Avro's schema to parse
>an added schema, catches parse exceptions, looks at the exception
>message to see whether or not the error is due to a missing or undefined
>type, and thus goes off to import the needed schema. Brittle, I know,
>but no other way for us. We need this functionality, and nothing else
>comes close to Avro.

On the Java side, recent versions have a Parser that can deal with schema
import.  It requires that a schema be defined before use however.  Perhaps
we can add a callback to the API for returning undefined schemas as they
are found.

>
>So far so good, until today when I hit a corner case.
>
>Say I have an address object that has two fields, called position1 and
>position2. If position1 and position2 are non-primitive types, then the
>address schema doesn't parse so presumably is an invalid Avro schema.
>The error concerns redefining the location type. Here's the example:
>
>location schema
>=============>
>{
>     "name": "location",
>     "type": "record",
>     "namespace" : "some.domain",
>     "fields" :
>     [
>         {
>             "name": "latitude",
>             "type": "float"
>         },
>         {
>             "name": "longitude",
>             "type": "float"
>         }
>     ]
>}
>
>address schema
>=============>
>{
>     "name": "address",
>     "type": "record",
>     "namespace" : "some.domain",
>     "fields" :
>     [
>         {
>             "name": "street",
>             "type": "string"
>         },
>         {
>             "name": "city",
>             "type": "string"
>         },
>         {
>             "name": "position1",
>             "type": "some.domain.location"
>         },
>         {
>             "name": "position2",
>             "type": "some.domain.location"
>         }
>     ]
>}
>
>
>Now, an answer of having a list of positions as a field is not an answer
>for us, as we need to solve the general issue of a schema with more than
>one instance of the same nested type i.e. my problem is not with an
>address or location schema.
>
>Can this be done? This is potentially a blocker for us.

This should be possible.  A named type can be used for multiple
differently named fields in a record. Is the parse error in C# or Java?
What is the error?

>
>cheers,
>Peter
>