Cool. Thanks for the response.
I've had early success reading avro files with the avro c library and
Go through cgo. It was relatively straight forward. It's a tad
tedious as the new "value" interface on the C library uses a lot of
macros, and cgo cannot (AFAIK) call macros directly. Rather, I needed
to create C-wrapper functions for all the macros. I did this for
about 8 or so macros (just the ones I needed as a proof of concept,
but it included most everything you'd expect on the reading side
including generic readers, retrieving writer schema, iterating over
record values, teasing out unions/disciriment branches, retrieving
strings & long values, get field by index and by name, corresponding
incref/decref, and generic readers,). Aside from the macros,
integrating with C from Go is straight forward and, with some quick
tests, seems to be comparable in performance to C.
I have tested performance using a simple script that reads through an
Avro file, extracts two fields (string and long), and sums up the
longs across all records (strings are just dropped to the floor). I
tested with a ~900M avro file (compressed blocks) that has about 25M
records. On my machine, the simple C library I built runs through it
in about 42seconds. The Go library I have that essentially does the
same thing with Go/Cgo accomplishes the same task in about 51 seconds.
A more common (in my domain) sized input (~270M avro file) containing
~7.5M records runs ~15s C and ~18s in Go). We regularly process 100s
of files of that size/shape. This is not taking advantage of any of
the Go concurrency routines / etc. and the Go code is largely just the
C code in Go clothing. But i was pleased to see pretty negligible
Looking down the road, an idiomatic library should follow a similar
pattern to the Go "encoding/json" package. That shouldn't be too
difficult. They only real barrier is time ;-) I currently have a
task at hand and have enough pieces to accomplish it. I will circle
back on this though as I get a little more comfort with Go idioms and
I wanted to share the above though as I view these quick results as promising.
p.s. I also tested using C to convert a record to a json *char and
pass that to a go function that unmarshals it into a Go struct. this
worked fine, but, as one would expects, adds a considerable amount of
overhead - 12 minutes for the same 52 second test noted above. it
does work though for a quick approach.
On Mar 20, 2014 4:33 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: