Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Go library


Copy link to this message
-
Re: Go library
Mike Stanley 2014-03-21, 12:09
Cool.  Thanks for the response.

Quick update:

I've had early success reading avro files with the avro c library and
Go through cgo.  It was relatively straight forward.  It's a tad
tedious as the new "value" interface on the C library uses a lot of
macros, and cgo cannot (AFAIK) call macros directly.  Rather, I needed
to create C-wrapper functions for all the macros.  I did this for
about 8 or so macros (just the ones I needed as a proof of concept,
but it included most everything you'd expect on the reading side
including generic readers, retrieving writer schema, iterating over
record values, teasing out unions/disciriment branches, retrieving
strings & long values, get field by index and by name, corresponding
incref/decref, and generic readers,).  Aside from the macros,
integrating with C from Go is straight forward and, with some quick
tests, seems to be comparable in performance to C.

I have tested performance using a simple script that reads through an
Avro file, extracts two fields (string and long), and sums up the
longs across all records (strings are just dropped to the floor).  I
tested with a ~900M avro file (compressed blocks) that has about 25M
records.  On my machine, the simple C library I built runs through it
in about 42seconds.  The Go library I have that essentially does the
same thing with Go/Cgo accomplishes the same task in about 51 seconds.
 A more common (in my domain) sized input (~270M avro file) containing
~7.5M records runs ~15s C and ~18s in Go).   We regularly process 100s
of files of that size/shape.   This is not taking advantage of any of
the Go concurrency routines / etc. and the Go code is largely just the
C code in Go clothing.  But i was pleased to see pretty negligible
overhead.

Looking down the road, an idiomatic library should follow a similar
pattern to the Go "encoding/json" package.   That shouldn't be too
difficult.  They only real barrier is time ;-)   I currently have a
task at hand and have enough pieces to accomplish it.   I will circle
back on this though as I get a little more comfort with Go idioms and
idiosyncrasies.

I wanted to share the above though as I view these quick results as promising.

p.s. I also tested using C to convert a record to a json *char and
pass that to a go function that unmarshals it into a Go struct.  this
worked fine, but, as one would expects, adds a considerable amount of
overhead - 12 minutes for the same 52 second test noted above.  it
does work though for a quick approach.

On Mar 20, 2014 4:33 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: