Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # dev - A case for adding revision field to Avro schema


Copy link to this message
-
Re: A case for adding revision field to Avro schema
Philip Zeyliger 2010-09-22, 06:25
On Tue, Sep 21, 2010 at 8:00 PM, Thiruvalluvan M. G. <[EMAIL PROTECTED]>wrote:

> Thanks Philip for your crisp description of what happens with Thrift and
> PB.
> I had assumed that the community knows the difference between those systems
> and Avro. your description should help those who don't know and be
> refresher
> for those who know.
>
> > What happens if the application doesn't recognize the branch number?  If
> > you're a client of a get_person(id) call, and you were written when
> > Person_v1 was the only one in existence, Avro, today, would do just fine
> at
> > projecting Person_v2 down into Person_v1 for you.  That's because your
> > reader schema would be v1, and you'd read some data written with v2, and
> > those are compatible.  If you have a "version id", then it's hard to go
> do
> > compatibility of old readers reading new data.
>
> My proposal was:
>
> "3. Schemas match as per the current matching rules, even if the revisions
> do not match."
>
> That is, since Person_v2 and Person_v1 have the same name "Person" and
> different revisions v2 and v1, they would match according to the current
> rules.
>

I'm beginning to understand your proposal a little bit better.  What happens
when the revisions aren't linear?  (Or do we require them to be?)

For example:

Writer's Schema union:
Person_a: (name)
Person_b: (name, age)

Reader's Schema union:
Person_c: (age)
Person_d: (age, school [default=""])

When "Person_b, Philip, 28" is written, what would a subsequent reader see?

I'm worried that the semantics of reader and writer schemas are already
complicated enough; adding in sets of schemas makes it even trickier.

-- Philip