Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Scala API


Hi people,

Is there interest in a custom Scala API for Avro records and protocols?
I am currently working on an schema compiler for Scala, but before I go
deeper, I would really like to have external feedback.
I would especially like to hear from anyone who has opinions on how to map
Avro types onto Scala types.
Here are a few hints on what I've been trying so far:

   - Records are compiled into two forms: mutable and immutable.
   - To avoid collisions with Java generated classes, scala classes are
   generated in a .scala sub-package.
   - Avro arrays are translated to Seq/List when immutable and
   Buffer/ArrayBuffer when mutable.
   - Avro maps are translated to immutable or mutable Map/HashMap.
   - Bytes/Fixed are translated to Seq[Byte] when immutable and
   Buffer[Byte] when mutable.
   - Avro unions are currently translated into Any, but I plan to:
   - translate union{null, X} into Scala Option[X]
      - compile union {T1, T2, T3} into a custom case classes to have
      proper type checking and pattern matching.
   - Scala records provide a method encode(encoder) to serialize as binary
   into a byte stream (appears ~30% faster than SpecificDatumWriter).
   - Scala mutable records provide a method decode(decoder) to deserialize
   a byte stream (appears ~25% faster than SpecificDatumReader).
   - Scala records implement the SpecificRecord Java interface (with some
   overhead), so one may still use the SpecificDatumReader/Writer when the
   custom encoder/decoder methods cannot be used.
   - Mutable records can be converted to immutable (ie. can act as
   builders).

Thanks,
Christophe