Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Record extensions?


Copy link to this message
-
Re: Record extensions?
On Tue, Jun 12, 2012 at 10:38 AM, Christophe Taton <[EMAIL PROTECTED]> wrote:
> I need my server to handle records with fields that can be "freely" extended
> by users, without requiring a recompile and restart of the server.
> The server itself does not need to know how to handle the content of this
> extensible field.
>
> One way to achieve this is to have a bytes field whose content is managed
> externally, but this is very ineffective in many ways.
> Is there a another way to do this with Avro?

You could use a very generic schema, like:

{"type":"record", "name":"Value", fields: [
  {"name":"value", "type": ["int","float","boolean", ...
{"type":"map", "values":"Value"}}
]}

This is roughly equivalent to a binary encoding of JSON.  But by using
a map it forces the serialization of a field name with every field
value.  Not only does that make payloads bigger but it also makes them
slower to construct and parse.

Another approach is to include the Avro schema for a value in the record, e.g.:

{"type":"record", "name":"Extensions", fields: [
  {"name":"schema", type: "string"},
  {"name":"values", "type": {"type":"array", "items":"bytes"}}
]}

This can make things more compact when there are a lot of values.  For
example, this might be used in a search application where each query
lists the fields its interested in retrieving and each response
contains a list of records that match the query and contain just the
requested fields.  The field names are not included in each match, but
instead once for entire set of matches, making this faster and more
compact.

Finally, if you have a stateful connection then you can send send a
schema in the first request then just send bytes encoding instances of
that schema in subsequent requests over that connection.  This again
avoids sending field names with each field value.

Doug