Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # dev - schema repositories?


Copy link to this message
-
Re: schema repositories?
Scott Carey 2012-07-10, 20:37
Jay,

This would be fantastic.  I can sponsor getting this into Avro and help
out.  Please use AVRO-1006 or if it does not seem appropriate create
another.

I would like to start with what you have, since it is used and in
production it seems like the right starting place.  But the community will
need to form consensus on what to do; we will discuss that in the JIRA.
Thanks!

-Scott
On 7/10/12 10:53 AM, "Jay Kreps" <[EMAIL PROTECTED]> wrote:

>I noticed in AVRO-1006 there was a mention of standardizing on some kind
>of
>schema repository that would maintain a central set of all versions of a
>schema and allow a way to reference schemas by id.
>
>At LinkedIn we have standardized (almost) all of our persistent data on
>Avro and we have a repository like this for managing schemas. Messages are
>stored with the schema in Hadoop, but for systems that store rows
>independently like databases or messaging we instead store a schema id
>with
>each row/message. We would love for there to be an open source version of
>this to make it possible to open up our other tools
>for compatibility checking, etl and other things that depend on service.
>
>The service itself is basically a REST service that maintains schemas.
>Each
>schema has a "source" that it is associated with (the table or messaging
>topic or whatever) and a unique id. Schemas can be fetched by id or you
>can
>get the latest schema for a given source. Having the notion of sources
>allows us to do two things: (1) enforce a compatibility modal on schema
>changes (no backwards incompatible changes for various definitions of
>backwards compatibility), and (2) allow our hadoop etl to project all
>messages forward to the latest schema (since AvroFile requires a single
>schema not a per-row schema).
>
>If the Avro project is interested in adopting an official repository that
>would be really nice. It is frankly a pretty trivial piece of code, but
>standardization would allow interoperability between things. I would be
>willing to either open source our repository implementation or do a
>from-scratch one if we come up with more requirements.
>
>-Jay