Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Default field values required for field deletion


Copy link to this message
-
Re: Default field values required for field deletion
Yes, I think you have this right.  If you ever wish to delete a field
from a record and maintain both forward and backward compatibility
then you should specify a default value for that field.  Similarly, if
you add a field then you should supply a default value so that you can
read old data that does not contain that field using the new schema.

Doug

On Mon, Oct 1, 2012 at 9:28 AM, Mark Hayes <[EMAIL PROTECTED]> wrote:
> Hi,
>
> We're using Avro as the storage format for database records, and schema
> evolution is a key feature for us.  I have a question regarding the deletion
> of fields from a record, when a schema is changed.
>
> Let's say a field X that is present in v1 of the schema, but does not define
> a default value, is deleted in v2 of the schema.  There can be a mix of v1
> and v2 records in the database, and a mix of v1 and v2 client apps (apps
> that use v1 or v2 as their writer and reader schema).
>
> If a v1 app reads a v2 record (written by a v2 app), an exception will be
> thrown because the reader schema contains field X, the record being
> deserialized does not contain field X, and the reader schema does not
> contain a default value for field X.
>
> Therefore, our conclusion is that a default value must be defined for each
> field in a schema, in order to support deletion of that field from the
> schema at a future time.
>
> To delete a field that does not define a default value, the only possibility
> would be to upgrade all clients to v2 before using the v2 schema for
> writing.  This is usually impractical in a large distributed system.
>
> My question is:  Does this make sense -- have I got it right?
>
> Thanks in advance,
> --mark
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB