Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Re: Avro vs Json


Copy link to this message
-
Re: Avro vs Json
Tatu Saloranta 2012-08-14, 02:33
1On Mon, Aug 13, 2012 at 3:59 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>> It is worth keeping in mind that explicit external schema is another
>> cost in not just designing but also maintaining the system. As such,
>> it is most useful for closely-coupled internal system, where one
>> controls both ends. This may be the case for computing pipelines a
>> single team owns.
>
>
> Our experiences have been quite the opposite. When the developer producing
> data was the same as the developer writing code to consume it, json worked
> fine since the developer knew what fields to expect. As our company grew,
> this turned into tribal knowledge and the approach did not scale. That's
> when having schemas is critical: when one team produces data and many others
> consume it. The cost is that the producer needs to publish the schema for
> others to discover.

Interesting, good point.

I was rather thinking of main cost being in maintenance, i.e. if and
when format changes, not so much upfront effort (although that's more
visible). And that cost depends on amount of change, if any, as well
as effort for other systems to adapt. Avro does have better support
for schema evolution, at least in theory. So that could help too.

-+ Tatu +-