Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> Re: Effort towards Avro 2.0?


Copy link to this message
-
Re: Effort towards Avro 2.0?
On Tue, Dec 3, 2013 at 7:49 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 2, 2013 at 1:42 PM, Christophe Taton
> <[EMAIL PROTECTED]> wrote:
> > - New extension data type, similar to ProtocolBuffer extensions
> (incompatible change).
>
> Extensions might be implemented as something like:
>
>   {"type":"record", "name":"extension", "fields":[
>     {"name":"fingerprint", "type": {"type":"fixed", "size":16}},
>     {"name":"payload", "type":"bytes"}
>     ]
>   }
>
> One could then use this with:
>
>   {"type":"record", "name":"Foo", "fields":[
>     {"name":"bar", "type":"extension"}
>     ]
>   }
>
> The implementation could then find the schema for the extension at
> runtime given its fingerprint.  The reader could have a table mapping
> fingerprints to schemas.
>
> In particular, the specific compiler, when it sees a schema like:
>
>
>   {"type":"record", "name":"Bar", "isExtension":true, "fields":[
>     {"name":"x", "type":"long"}
>     ]
>   }
>
> Might emit code to add entries to the extension mapping table used by
> SpecificDatumReader, e.g.:
>
>   static {
>     SpecificData.addExtension(getSchema());
>   }
>
> Might something like this work?
>

Yes, this is very much the idea.
In a prototype I made a few months ago, I found allowing the user to
specify the fingerprint schema useful : in some scenario, an extension
could be prefixed by a string that contains the JSON schema; in some other
scenario, I may want to use fingerprints to identify the schema of the
extension; in some other cases, I may want to use some external mapping
maintained by another system (eg. the schema repository worked on in
AVRO-1124).

C.