Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Scala API


+
Christophe Taton 2012-05-30, 06:04
+
Philip Zeyliger 2012-05-30, 16:34
+
Michael Armbrust 2012-06-05, 21:53
Copy link to this message
-
Re: Scala API
Scott Carey 2012-05-30, 21:52
This would be fantastic.  I would be excited to see it.  It would be great
to see a Scala language addition to the project if you wish to contribute.

I believe there have been a few other Scala Avro attempts by others over
time.   I recall one where all records were case classes (but this broke at
22 fields).
Another thing to look at is:
http://code.google.com/p/avro-scala-compiler-plugin/

Perhaps we can get a few of the other people who have developed Scala Avro
tools to review/comment or contribute as well?

On 5/29/12 11:04 PM, "Christophe Taton" <[EMAIL PROTECTED]> wrote:

> Hi people,
>
> Is there interest in a custom Scala API for Avro records and protocols?
> I am currently working on an schema compiler for Scala, but before I go
> deeper, I would really like to have external feedback.
> I would especially like to hear from anyone who has opinions on how to map
> Avro types onto Scala types.
> Here are a few hints on what I've been trying so far:
> * Records are compiled into two forms: mutable and immutable.
Very nice.
> * To avoid collisions with Java generated classes, scala classes are generated
> in a .scala sub-package.
> * Avro arrays are translated to Seq/List when immutable and Buffer/ArrayBuffer
> when mutable.
> * Avro maps are translated to immutable or mutable Map/HashMap.
> * Bytes/Fixed are translated to Seq[Byte] when immutable and Buffer[Byte] when
> mutable.
> * Avro unions are currently translated into Any, but I plan to:
>> * translate union{null, X} into Scala Option[X]
>> * compile union {T1, T2, T3} into a custom case classes to have proper type
>> checking and pattern matching.
If you have a record R1, it compiles to a Scala class.  If you put it in a
union of {T1, String}, what does the case class for the union look like?  Is
it basically a wrapper like a specialized Either[T1, String] ?   Maybe Scala
will get Union types later to push this into the compiler instead of object
instances :)
> * Scala records provide a method encode(encoder) to serialize as binary into a
> byte stream (appears ~30% faster than SpecificDatumWriter).
> * Scala mutable records provide a method decode(decoder) to deserialize a byte
> stream (appears ~25% faster than SpecificDatumReader).
I have some plans to improve {Generic,Specific}Datum{Reader,Writer}  in
Java, I would be interested in seeing how the Scala one here works.  The
Java one is slowed by traversing too many data structures that represent
decisions that could be pre-computed rather than repeatedly parsed for each
record.
> * Scala records implement the SpecificRecord Java interface (with some
> overhead), so one may still use the SpecificDatumReader/Writer when the custom
> encoder/decoder methods cannot be used.
> * Mutable records can be converted to immutable (ie. can act as builders).
> Thanks,
> Christophe
>
+
Christophe Taton 2012-05-30, 23:26