Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Several new features in C library


Copy link to this message
-
Several new features in C library
Hello everyone,

I've been working on a couple of new features that I hope to have added to the C library in time for the 1.6.0 release, and I wanted to provide a brief overview for those who aren't subscribed to the developers list.

The main item is a new interface for handling Avro values in C code.  It's intended to replace the existing avro_datum_t API.  Some initial performance tests are showing it to be an order of magnitude faster in certain test cases.  One nice feature of the new API is that you can provide custom implementations of the value interface, which will let you use existing C types directly as Avro values.  Moreover, the methods for getting and setting the contents of a bytes, fixed, or string value allow for zero-copy implementations, which make it easier to have Avro values that efficiently wrap external buffers.

I've updated the C API documentation to describe the new interface.  This will automatically get deployed to the avro.apache.org website when we cut the 1.6.0 release; in the meantime, I've put a temporary copy of the new docs at [1].

[1] http://people.apache.org/~dcreager/values.html#_avro_values

In addition to the new value API, I've added two command-line tools, "avrocat" and "avropipe".  avrocat is much like the equivalent "dump an Avro file" tools in the Java and Python libraries; it prints out the contents of an Avro data file, one record per line, using the standard Avro JSON encoding.  avropipe produces output that's similar to the jsonpipe [2] tool, with one scalar value per line, no matter how nested the overall Avro schema is.  This can be useful if you want to use standard Unix tools to process the contents of an Avro file.

[2] https://github.com/dvxhouse/jsonpipe

The new features haven't been committed to SVN yet, but they're ready for a wider audience to review and test.  If you're a user of the Avro C library, I encourage you to take a look.  I have a github tracking branch set up [3] for those who are interested.

[3] https://github.com/dcreager/avro/tree/avropipe

You can also follow along on AVRO-396 and AVRO-837 on the issue tracker for more details.

cheers
–doug