The best practice is usually to use the flexible schema with the union
value rather than transmit schemas each time. This restricts the
possibilities to the set defined, and the type selected in the branch is
available on the decoding side. In the case above the number of variants
is not too large for this approach to be unwieldy, and there may be
benefits of knowing the type on the other side without inspecting the
You can construct an Avro schema that represents all possible data
variants, effectively tagging the types of every field during
serialization using unions. However none of the Avro APIs are (yet)
optimized for this use case, it would be somewhat clumsy to work with, and
is less space efficient. Other serialization systems are a better fit for
completely open-ended data schemas.
One can look at Avro as a serialization system, but I see it more as a
system for describing your data. It provides tools for describing and
transforming data that exists in related forms (e.g. older or newer schema
versions) to the form you want to see (e.g. current schema). Whether this
data is serialized or an object graph is less important than that it
conforms to a schema. A transformation between a serialized form and an
object graph is one use case of many possibilities.
Think about your use case from that perspective. Ask whether this is data
that gains benefit from describing it with an Avro Schema and then
interpreting it as conforming to that schema. If it is completely open
ended there may be little benefit and significant overhead.
You can also embed JSON or binary JSON in Avro data fairly easily using
On 12/7/11 9:10 AM, "Gaurav Nanda" <[EMAIL PROTECTED]> wrote:
>I agree that in this case Json would be equally helpful. But In my
>application there is one more type of message, where untagged data can
>provide compact data encoding. So to maintain consistency, I preferred
>to send these kind of messages also using avro.
>@where untagged data can provide compact data encoding.
>In that case also, my schema has to be dynamically generated (i.e. on
>runtime), so has to be passed to client. So would avro be better to
>compressed json is that case?
>On Wed, Dec 7, 2011 at 9:17 PM, Tatu Saloranta <[EMAIL PROTECTED]>
>> On Wed, Dec 7, 2011 at 5:16 AM, Gaurav <[EMAIL PROTECTED]> wrote:
>>> We have a requirement to send typed(key-value) pairs from server to
>>> (in various languages).
>>> Value can be one of primitive types or a map of same (string, Object)
>>> One option is to construct record schema on the fly and second option
>>> use unions to write schema in a general way.
>>> Problems with 1 is that we have to construct schema everytime
>>> keys and then attach the entire string schema to a relatively small
>>> But in second schema, u don't need to write schema on the wire as it is
>>> present with client also.
>>> I have written one such sample schema:
>>> Do you guys think writing something of this sort makes sense or is
>>> better approach to this?
>> For this kind of loose data, perhaps JSON would serve you better,
>> unless you absolutely have to use Avro?
>> -+ Tatu +-