Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> Re: Effort towards Avro 2.0?


Copy link to this message
-
Re: Effort towards Avro 2.0?
On Tue, Dec 3, 2013 at 7:49 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 2, 2013 at 1:42 PM, Christophe Taton
> <[EMAIL PROTECTED]> wrote:
> > - New extension data type, similar to ProtocolBuffer extensions
> (incompatible change).
>
> Extensions might be implemented as something like:
>
>   {"type":"record", "name":"extension", "fields":[
>     {"name":"fingerprint", "type": {"type":"fixed", "size":16}},
>     {"name":"payload", "type":"bytes"}
>     ]
>   }
>
> One could then use this with:
>
>   {"type":"record", "name":"Foo", "fields":[
>     {"name":"bar", "type":"extension"}
>     ]
>   }
>
> The implementation could then find the schema for the extension at
> runtime given its fingerprint.  The reader could have a table mapping
> fingerprints to schemas.
>
> In particular, the specific compiler, when it sees a schema like:
>
>
>   {"type":"record", "name":"Bar", "isExtension":true, "fields":[
>     {"name":"x", "type":"long"}
>     ]
>   }
>
> Might emit code to add entries to the extension mapping table used by
> SpecificDatumReader, e.g.:
>
>   static {
>     SpecificData.addExtension(getSchema());
>   }
>
> Might something like this work?
>

Yes, this is very much the idea.
In a prototype I made a few months ago, I found allowing the user to
specify the fingerprint schema useful : in some scenario, an extension
could be prefixed by a string that contains the JSON schema; in some other
scenario, I may want to use fingerprints to identify the schema of the
extension; in some other cases, I may want to use some external mapping
maintained by another system (eg. the schema repository worked on in
AVRO-1124).

C.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB