Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # dev - Re: Effort towards Avro 2.0?

Copy link to this message
Re: Effort towards Avro 2.0?
Douglas Creager 2013-12-04, 20:18
>    - inefficient because you'll end up serializing your data twice, once
>    from the actual type into the bytes field, then a second type as a
>    bytes field;

I don't think it's as inefficient as you might think — the second
serialization just blits the raw bytes content into some destination
buffer/pipe/socket/etc.  The C binding already does this under the
covers to handle blocks when writing into a data file.  And it hasn't
been a performance bottleneck.

>    - unwieldy because as a user, I'll have to encode and decode the bytes
>    field manually everytime I want to access this field from the original
>    record, unless I keep track of the decoded extension externally to the
>    Avro record.

Can you handle this in the middleware?  I.e., have the middleware decode
the bytes field before passing control to the user code.  That's better
from a decoupling standpoint anyway, since the user code shouldn't care
what middleware is wrapping it.

> When you write a middleware that lets users define custom types,
> extensions are pretty much required.

I guess my main point is that we already have two mechanisms for dealing
with user extensions (schema resolution and Doug's bytes field
proposal), both of which work just fine at runtime without rebuilding or
restarting your code.  In general, I think it's better if we can solve a
problem at the library or application level, without having to update
the spec.