Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Effort towards Avro 2.0?


Copy link to this message
-
Effort towards Avro 2.0?
Hi all,

Avro, in its current form, exhibits a number of limitations that are hard
to work with or around, and hard to fix within the scope of Avro 1.x :
fixing these issues would introduce incompatible changes that warrant a
major version bump, ie. Avro 2.0. An Avro 2.0 branch would be an
opportunity to address most issues that appeared held back for
compatibility purposes so far.

I would like to initiate an effort in this direction and I am willing to do
the necessary work to gather and organize requirements, and draft a design
document for what Avro 2.0 would look like. For this reason, if you have
opinions regarding an Avro 2.0 branch or regarding issues and features that
could fit in Avro 2.0, please reply to this thread.

To bootstrap, below is a list I gathered over the last couple of years from
several discussions:

   - Specification
   - Improved support for unions (incompatible change with named unions and
      union properties).
      - New extension data type, similar to ProtocolBuffer extensions
      (incompatible change).
      - Clear separation between Avro schema (data format) and specific API
      client concerns: for example, the way Avro strings are exposed
through the
      Java API should not pollute the schema definition. Each particular Java
      client should configure their own decoders with the way they want Avro
      strings to be represented.
      - Clarification of compatibility and type promotion (safe lossless
      conversions vs. best-effort lossy conversions): promoting int to float
      potentially loses precision, which is not necessarily acceptable for all
      clients. Avro decoders should let clients configure which mode they need.
   - IDL
   - Generalized IDL for Avro schemas.
      - Support for recursive records.
      - Meta-schema : IDL definition for a schema.
      - Java API
   - Truly immutable schema objects (no properties / hashcode mutation
      after construction).
      - Immutable records.
      - Complete record builder API (current record builders do not play
      well with nested records).
      - Complete generic API (there currently is no GenericUnion or
      GenericMap).
      - Improved unions support : union values as java.lang.Object are less
      than ideal; union values could expose the union branch through an enum
      (nulls could be handled specifically).
      - Python 3 support
   - RPC
      - SASL support
      - Full Python/Java parity and interoperability.

Please, comment or extend this list. Provided enough interest, I'll happily
digest feedback and organize it into a document (most likely a wiki page?).

Thanks,
Christophe