Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> Re: Effort towards Avro 2.0?

Copy link to this message
Re: Effort towards Avro 2.0?
Hi Douglas,

When you write a middleware that lets users define custom types, extensions
are pretty much required.

Middleware doesn't need to, and shouldn't need to know these user-defined
custom types ahead of time : you don't want to rebuild and restart your
middleware everytime a user define a new type they want handled by the

An explicit bytes field always works, but is both inefficient and unwieldy:

   - inefficient because you'll end up serializing your data twice, once
   from the actual type into the bytes field, then a second type as a bytes
   - unwieldy because as a user, I'll have to encode and decode the bytes
   field manually everytime I want to access this field from the original
   record, unless I keep track of the decoded extension externally to the Avro

On Wed, Dec 4, 2013 at 8:07 AM, Douglas Creager <[EMAIL PROTECTED]>wrote:

> On Tue, Dec 3, 2013, at 07:49 AM, Doug Cutting wrote:
> > On Mon, Dec 2, 2013 at 1:42 PM, Christophe Taton
> > <[EMAIL PROTECTED]> wrote:
> > > - New extension data type, similar to ProtocolBuffer extensions
> (incompatible change).
> >
> > Extensions might be implemented as something like:
> >
> >   {"type":"record", "name":"extension", "fields":[
> >     {"name":"fingerprint", "type": {"type":"fixed", "size":16}},
> >     {"name":"payload", "type":"bytes"}
> >     ]
> >   }
> I'd also want to know more about the kind of use cases that you'd need
> protobuf-style extensions for.  I like Doug's solution if each record
> can have a different set of extensions.  If all of the records will have
> the same set of extensions, my hunch is that you'd only need to use
> extra fields and schema resolution.  Either way, I can't think of a use
> case where a new data type in the spec is a noticeable improvement.
> –doug