Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Re: Avro vs Json


Copy link to this message
-
Re: Avro vs Json
Tatu Saloranta 2012-08-13, 17:47
On Sun, Aug 12, 2012 at 7:42 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
> The benefit of having a schema associated with your data should not be
> understated. I think when debating whether to use JSON or some other data
> serialization format that has a schema (like Avro), you should choose the
> later. The schema support alone will pay dividends over the long run.

I would argue it is one of those things that is overstated due to
intuitive attractiveness.
It is worth keeping in mind that explicit external schema is another
cost in not just designing but also maintaining the system. As such,
it is most useful for closely-coupled internal system, where one
controls both ends. This may be the case for computing pipelines a
single team owns.

Put another way: both benefits and costs of schemas accumulate over
long run, and the ratio ultimately determines which one wins. And yet
it is very hard to forecast in advance.
What can be said is that maintenance of no-schema is cheaper than
mainteinance of schema. Value of schema, on the other hand, is much
harder to estimate a priori.

-+ Tatu +-