Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # dev >> Re: Effort towards Avro 2.0?


+
Doug Cutting 2013-12-03, 15:49
+
Christophe Taton 2013-12-04, 18:24
+
Douglas Creager 2013-12-04, 16:07
Copy link to this message
-
Re: Effort towards Avro 2.0?
Hi Douglas,

When you write a middleware that lets users define custom types, extensions
are pretty much required.

Middleware doesn't need to, and shouldn't need to know these user-defined
custom types ahead of time : you don't want to rebuild and restart your
middleware everytime a user define a new type they want handled by the
middleware.

An explicit bytes field always works, but is both inefficient and unwieldy:

   - inefficient because you'll end up serializing your data twice, once
   from the actual type into the bytes field, then a second type as a bytes
   field;
   - unwieldy because as a user, I'll have to encode and decode the bytes
   field manually everytime I want to access this field from the original
   record, unless I keep track of the decoded extension externally to the Avro
   record.

C.
On Wed, Dec 4, 2013 at 8:07 AM, Douglas Creager <[EMAIL PROTECTED]>wrote:

> On Tue, Dec 3, 2013, at 07:49 AM, Doug Cutting wrote:
> > On Mon, Dec 2, 2013 at 1:42 PM, Christophe Taton
> > <[EMAIL PROTECTED]> wrote:
> > > - New extension data type, similar to ProtocolBuffer extensions
> (incompatible change).
> >
> > Extensions might be implemented as something like:
> >
> >   {"type":"record", "name":"extension", "fields":[
> >     {"name":"fingerprint", "type": {"type":"fixed", "size":16}},
> >     {"name":"payload", "type":"bytes"}
> >     ]
> >   }
>
> I'd also want to know more about the kind of use cases that you'd need
> protobuf-style extensions for.  I like Doug's solution if each record
> can have a different set of extensions.  If all of the records will have
> the same set of extensions, my hunch is that you'd only need to use
> extra fields and schema resolution.  Either way, I can't think of a use
> case where a new data type in the spec is a noticeable improvement.
>
> –doug
>
+
Douglas Creager 2013-12-04, 20:18
+
Christophe Taton 2013-12-05, 07:40
+
Doug Cutting 2013-12-12, 17:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB