Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> Several topics:  Naming, in memory representation of Avro objects, future format enchancements

Copy link to this message
Re: Several topics: Naming, in memory representation of Avro objects, future format enchancements
I'm chiming in a bit late here.
> [avrogen] If it is intended to be the primary place that users of Avro
> define and maintain schemas, I'll check it out soon!

Avrogen is still young, but I do think that it's the right approach to
maintaining schemas.  Stuff like inclusion of other files should be handled
at that layer.  If you find it lacking, please do let us know; it's pretty
easy to work on.
> >> * In Memory data representation
> >>    Avro is very good at reducing serialized size, but doesn't optimize
> memory footprint.  None of this is a big deal for the typical Hadoop use
> case, but for my use cases where I want to serialize these things into BDB's
> or some other key/value store -- in memory footprint is critical.  Extra
> nested object references can easily consume a lot of memory and reduces the
> effectiveness of in memory caching for key/value stores.  Another time you
> would want to make sure minimum memory is used is in a map side join.

There's a lot more freedom in manipulating the APIs than there is in
manipulating the data format.  I'm not as convinced of the readability of
the Encoder/Decoder API as Doug is, but I do think there's a lot of tweaking
that could be done to make the generated specific code nicer.
> Oddly, adding naming to unions, arrays, etc have the possibility of
> _reducing_ the verbosity of the JSON.   Yes, one would "have to" name the
> array, but one wouldn't be forced to create an anonymous record inside a
> field, since fields must be named and almost everything in a *.avsc is a
> field.

 Naming arrays and maps would make them less natural in most programming
> languages, where arrays and maps are un-named.
 I suspect we could add typedefs to avrogen.  Those might make sense for
this sort of thing.

>  Languages with runtime typing (Java, Python, Ruby, etc.) most naturally
> represent unions implicitly with runtime typing.
The bit that bugs me most here is the casting in Java.  We could add helper
methods in the specific API that would give folks compile-time help, but
it's weird that the methods would be "getString()" instead of "getFooBar()".
>  Maven
Scott, if you have Maven-fu that you'd like to share, please do!  The goal
is to be available as easily as possible in as many ways as possible.

-- Philip