Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Nested schema issue


Copy link to this message
-
Re: Nested schema issue


On 5/1/12 9:47 AM, "Peter Cameron" <[EMAIL PROTECTED]> wrote:

>I'm having a problem with nesting schemas. A very brief overview of why
>we're using Avro (successfully so far) is:
>
>o code generation not required
>o small binary format
>o dynamic use of schemas at runtime
>
>We're doing a flavour of RPC, and the reason we're not using Avro's IDL
>and flavour of RPC is because the endpoint is not necessarily a Java
>platform (C# and Java for our purposes), and only the Java
>implementation of Avro has RPC. Hence no Avro RPC for us.
>
>I'm aware that Avro doesn't import nested schemas out of the box. We
>need that functionality as we're exposed to schemas over which we have
>no control, and in the interests of maintainability, these schemas are
>nicely partitioned and are referenced as types from within other
>schemas. So, for example, a address schema refers to a
>some.domain.location object by having a field of type
>"some.domain.location". Note that our runtime has no knowledge of any
>some.domain package (e.g. address or location objects). Only the
>endpoints know about some.domain. (A layer at our endpoint runtime
>serialises any unknown i.e. non-primitive objects as bytestreams.)
>
>I implemented a schema cache which intelligently imports schemas on the
>fly, so adding the address schema to the cache, automatically adds the
>location schema that it refers to. The cache uses Avro's schema to parse
>an added schema, catches parse exceptions, looks at the exception
>message to see whether or not the error is due to a missing or undefined
>type, and thus goes off to import the needed schema. Brittle, I know,
>but no other way for us. We need this functionality, and nothing else
>comes close to Avro.

On the Java side, recent versions have a Parser that can deal with schema
import.  It requires that a schema be defined before use however.  Perhaps
we can add a callback to the API for returning undefined schemas as they
are found.

>
>So far so good, until today when I hit a corner case.
>
>Say I have an address object that has two fields, called position1 and
>position2. If position1 and position2 are non-primitive types, then the
>address schema doesn't parse so presumably is an invalid Avro schema.
>The error concerns redefining the location type. Here's the example:
>
>location schema
>=============>
>{
>     "name": "location",
>     "type": "record",
>     "namespace" : "some.domain",
>     "fields" :
>     [
>         {
>             "name": "latitude",
>             "type": "float"
>         },
>         {
>             "name": "longitude",
>             "type": "float"
>         }
>     ]
>}
>
>address schema
>=============>
>{
>     "name": "address",
>     "type": "record",
>     "namespace" : "some.domain",
>     "fields" :
>     [
>         {
>             "name": "street",
>             "type": "string"
>         },
>         {
>             "name": "city",
>             "type": "string"
>         },
>         {
>             "name": "position1",
>             "type": "some.domain.location"
>         },
>         {
>             "name": "position2",
>             "type": "some.domain.location"
>         }
>     ]
>}
>
>
>Now, an answer of having a list of positions as a field is not an answer
>for us, as we need to solve the general issue of a schema with more than
>one instance of the same nested type i.e. my problem is not with an
>address or location schema.
>
>Can this be done? This is potentially a blocker for us.

This should be possible.  A named type can be used for multiple
differently named fields in a record. Is the parse error in C# or Java?
What is the error?

>
>cheers,
>Peter
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB