Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Scala API


+
Christophe Taton 2012-05-30, 06:04
+
Philip Zeyliger 2012-05-30, 16:34
+
Michael Armbrust 2012-06-05, 21:53
Copy link to this message
-
Re: Scala API
This would be fantastic.  I would be excited to see it.  It would be great
to see a Scala language addition to the project if you wish to contribute.

I believe there have been a few other Scala Avro attempts by others over
time.   I recall one where all records were case classes (but this broke at
22 fields).
Another thing to look at is:
http://code.google.com/p/avro-scala-compiler-plugin/

Perhaps we can get a few of the other people who have developed Scala Avro
tools to review/comment or contribute as well?

On 5/29/12 11:04 PM, "Christophe Taton" <[EMAIL PROTECTED]> wrote:

> Hi people,
>
> Is there interest in a custom Scala API for Avro records and protocols?
> I am currently working on an schema compiler for Scala, but before I go
> deeper, I would really like to have external feedback.
> I would especially like to hear from anyone who has opinions on how to map
> Avro types onto Scala types.
> Here are a few hints on what I've been trying so far:
> * Records are compiled into two forms: mutable and immutable.
Very nice.
> * To avoid collisions with Java generated classes, scala classes are generated
> in a .scala sub-package.
> * Avro arrays are translated to Seq/List when immutable and Buffer/ArrayBuffer
> when mutable.
> * Avro maps are translated to immutable or mutable Map/HashMap.
> * Bytes/Fixed are translated to Seq[Byte] when immutable and Buffer[Byte] when
> mutable.
> * Avro unions are currently translated into Any, but I plan to:
>> * translate union{null, X} into Scala Option[X]
>> * compile union {T1, T2, T3} into a custom case classes to have proper type
>> checking and pattern matching.
If you have a record R1, it compiles to a Scala class.  If you put it in a
union of {T1, String}, what does the case class for the union look like?  Is
it basically a wrapper like a specialized Either[T1, String] ?   Maybe Scala
will get Union types later to push this into the compiler instead of object
instances :)
> * Scala records provide a method encode(encoder) to serialize as binary into a
> byte stream (appears ~30% faster than SpecificDatumWriter).
> * Scala mutable records provide a method decode(decoder) to deserialize a byte
> stream (appears ~25% faster than SpecificDatumReader).
I have some plans to improve {Generic,Specific}Datum{Reader,Writer}  in
Java, I would be interested in seeing how the Scala one here works.  The
Java one is slowed by traversing too many data structures that represent
decisions that could be pre-computed rather than repeatedly parsed for each
record.
> * Scala records implement the SpecificRecord Java interface (with some
> overhead), so one may still use the SpecificDatumReader/Writer when the custom
> encoder/decoder methods cannot be used.
> * Mutable records can be converted to immutable (ie. can act as builders).
> Thanks,
> Christophe
>
+
Christophe Taton 2012-05-30, 23:26
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB