Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> A case for adding revision field to Avro schema


Copy link to this message
-
Re: A case for adding revision field to Avro schema
On Tue, Sep 21, 2010 at 8:00 PM, Thiruvalluvan M. G. <[EMAIL PROTECTED]>wrote:

> Thanks Philip for your crisp description of what happens with Thrift and
> PB.
> I had assumed that the community knows the difference between those systems
> and Avro. your description should help those who don't know and be
> refresher
> for those who know.
>
> > What happens if the application doesn't recognize the branch number?  If
> > you're a client of a get_person(id) call, and you were written when
> > Person_v1 was the only one in existence, Avro, today, would do just fine
> at
> > projecting Person_v2 down into Person_v1 for you.  That's because your
> > reader schema would be v1, and you'd read some data written with v2, and
> > those are compatible.  If you have a "version id", then it's hard to go
> do
> > compatibility of old readers reading new data.
>
> My proposal was:
>
> "3. Schemas match as per the current matching rules, even if the revisions
> do not match."
>
> That is, since Person_v2 and Person_v1 have the same name "Person" and
> different revisions v2 and v1, they would match according to the current
> rules.
>

I'm beginning to understand your proposal a little bit better.  What happens
when the revisions aren't linear?  (Or do we require them to be?)

For example:

Writer's Schema union:
Person_a: (name)
Person_b: (name, age)

Reader's Schema union:
Person_c: (age)
Person_d: (age, school [default=""])

When "Person_b, Philip, 28" is written, what would a subsequent reader see?

I'm worried that the semantics of reader and writer schemas are already
complicated enough; adding in sets of schemas makes it even trickier.

-- Philip
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB