|
|
-
Versioning of an array of a record
Robin Müller 2010-09-16, 08:09
Hi,
I've read the part "Schema Resolution" of the Avro Specification. So I think that avro supports versioning of the schema. But when I try to change to following schema, an AvroTypeException will be thrown by reading data that was serialized with the old schema: { "name":"BrowserCountArray", "type":"record", "fields": [ { "name":"BrowserCounts", "type": { "type":"array", "items": { "name": "BrowserCount", "type": "record", "fields": [ { "name":"Browser", "type":"string" }, { "name":"Count", "type":"int" }] } } }] }
For example I add a new field to the BrowserCount record like this:
{ "name":"BrowserCountArray", "type":"record", "fields": [ { "name":"BrowserCounts", "type": { "type":"array", "items": { "name": "BrowserCount", "type": "record", "fields": [ { "name":"Browser", "type":"string" }, { "name":"Count", "type":"int" }, { "name":"Blub", "type":"int", "default":"0" }] } } }] }
Is it possible to add or remove fields from this record and read with this new schema data, that was serialized with an old one. Or is there another way to define an array of record which solves that problem.
Thanks, Robin
-
Re: Versioning of an array of a record
Scott Carey 2010-09-16, 08:21
Assuming Java: Are you using a ResolvingDecoder?
This will happen by default if you are reading Generic or Specific records from an Avro File, but if you are reading data otherwise, you have to use a ResolvingDecoder to specify the expected (reader) and actual (writer) schemas.
On Sep 16, 2010, at 1:09 AM, Robin Müller wrote:
> Hi, > > I've read the part "Schema Resolution" of the Avro Specification. So I > think that avro supports versioning of the schema. > But when I try to change to following schema, an AvroTypeException will > be thrown by reading data that was serialized with the old schema: > { > "name":"BrowserCountArray", > "type":"record", > "fields": [ > { > "name":"BrowserCounts", > "type": > { > "type":"array", > "items": { > "name": "BrowserCount", > "type": "record", > "fields": [ > { > "name":"Browser", > "type":"string" > }, { > "name":"Count", > "type":"int" > }] > } > } > }] > } > > For example I add a new field to the BrowserCount record like this: > > { > "name":"BrowserCountArray", > "type":"record", > "fields": [ > { > "name":"BrowserCounts", > "type": > { > "type":"array", > "items": { > "name": "BrowserCount", > "type": "record", > "fields": [ > { > "name":"Browser", > "type":"string" > }, { > "name":"Count", > "type":"int" > }, { > "name":"Blub", > "type":"int", > "default":"0" > }] > } > } > }] > } > > Is it possible to add or remove fields from this record and read with > this new schema data, that was serialized with an old one. > Or is there another way to define an array of record which solves that > problem. > > Thanks, > Robin
-
Re: Versioning of an array of a record
Robin Müller 2010-09-16, 08:34
Thanks for the fast reply. I use the implementation for avro-serialization from voldemort (key-value-store) and it seems that they don't use the ResolvingDecoder. But I think there is a way to use an own implementation for the serialization in voldemort. So I'll give it a try with the ResolvingDecoder.
Greetings, Robin
Am 16.09.2010 10:21, schrieb Scott Carey: > Assuming Java: Are you using a ResolvingDecoder? > > This will happen by default if you are reading Generic or Specific records from an Avro File, but if you are reading data otherwise, you have to use a ResolvingDecoder to specify the expected (reader) and actual (writer) schemas. > > On Sep 16, 2010, at 1:09 AM, Robin Müller wrote: > > >> Hi, >> >> I've read the part "Schema Resolution" of the Avro Specification. So I >> think that avro supports versioning of the schema. >> But when I try to change to following schema, an AvroTypeException will >> be thrown by reading data that was serialized with the old schema: >> { >> "name":"BrowserCountArray", >> "type":"record", >> "fields": [ >> { >> "name":"BrowserCounts", >> "type": >> { >> "type":"array", >> "items": { >> "name": "BrowserCount", >> "type": "record", >> "fields": [ >> { >> "name":"Browser", >> "type":"string" >> }, { >> "name":"Count", >> "type":"int" >> }] >> } >> } >> }] >> } >> >> For example I add a new field to the BrowserCount record like this: >> >> { >> "name":"BrowserCountArray", >> "type":"record", >> "fields": [ >> { >> "name":"BrowserCounts", >> "type": >> { >> "type":"array", >> "items": { >> "name": "BrowserCount", >> "type": "record", >> "fields": [ >> { >> "name":"Browser", >> "type":"string" >> }, { >> "name":"Count", >> "type":"int" >> }, { >> "name":"Blub", >> "type":"int", >> "default":"0" >> }] >> } >> } >> }] >> } >> >> Is it possible to add or remove fields from this record and read with >> this new schema data, that was serialized with an old one. >> Or is there another way to define an array of record which solves that >> problem. >> >> Thanks, >> Robin >> >
-
Re: Versioning of an array of a record
Scott Carey 2010-09-16, 16:25
Generally, Avro recommends storing the schema with the data. For a file that means in the header of the file, for a key/value store that means in some system metadata. Any individual store can only keep data serialized with one schema.
In order for schema migration to work, the new code has to have the schema of the old data. Old code should have access to the new data's schema too, then you can support both forwards and backwards compatibility. On Sep 16, 2010, at 1:34 AM, Robin Müller wrote:
> Thanks for the fast reply. > I use the implementation for avro-serialization from voldemort > (key-value-store) and it seems that they don't use the ResolvingDecoder. > But I think there is a way to use an own implementation for the > serialization in voldemort. So I'll give it a try with the ResolvingDecoder. > > Greetings, > Robin > > Am 16.09.2010 10:21, schrieb Scott Carey: >> Assuming Java: Are you using a ResolvingDecoder? >> >> This will happen by default if you are reading Generic or Specific records from an Avro File, but if you are reading data otherwise, you have to use a ResolvingDecoder to specify the expected (reader) and actual (writer) schemas. >> >> On Sep 16, 2010, at 1:09 AM, Robin Müller wrote: >> >> >>> Hi, >>> >>> I've read the part "Schema Resolution" of the Avro Specification. So I >>> think that avro supports versioning of the schema. >>> But when I try to change to following schema, an AvroTypeException will >>> be thrown by reading data that was serialized with the old schema: >>> { >>> "name":"BrowserCountArray", >>> "type":"record", >>> "fields": [ >>> { >>> "name":"BrowserCounts", >>> "type": >>> { >>> "type":"array", >>> "items": { >>> "name": "BrowserCount", >>> "type": "record", >>> "fields": [ >>> { >>> "name":"Browser", >>> "type":"string" >>> }, { >>> "name":"Count", >>> "type":"int" >>> }] >>> } >>> } >>> }] >>> } >>> >>> For example I add a new field to the BrowserCount record like this: >>> >>> { >>> "name":"BrowserCountArray", >>> "type":"record", >>> "fields": [ >>> { >>> "name":"BrowserCounts", >>> "type": >>> { >>> "type":"array", >>> "items": { >>> "name": "BrowserCount", >>> "type": "record", >>> "fields": [ >>> { >>> "name":"Browser", >>> "type":"string" >>> }, { >>> "name":"Count", >>> "type":"int" >>> }, { >>> "name":"Blub", >>> "type":"int", >>> "default":"0" >>> }] >>> } >>> } >>> }] >>> } >>> >>> Is it possible to add or remove fields from this record and read with >>> this new schema data, that was serialized with an old one. >>> Or is there another way to define an array of record which solves that >>> problem. >>> >>> Thanks, >>> Robin >>> >>
-
Re: Versioning of an array of a record
Doug Cutting 2010-09-16, 16:41
On 09/16/2010 09:25 AM, Scott Carey wrote: > Generally, Avro recommends storing the schema with the data. For a > file that means in the header of the file, for a key/value store that > means in some system metadata. Any individual store can only keep > data serialized with one schema. Another good pattern is to store the hashcode of the writer's schema with each written instance, then keep written schemas in a separate store, keyed by hashcode. For example, Sam Pullara's done this in his HAvroBase: http://www.javarants.com/2010/06/30/havrobase-a-searchable-evolvable-entity-store-on-top-of-hbase-and-solr/Doug
-
Re: Versioning of an array of a record
Jeff Hammerbacher 2010-09-17, 15:43
For more discussion of best practices for storing Avro-serialized data structures in a database, see http://www.quora.com/What-is-the-best-way-to-work-with-Avro-serialized-data-structures-in-a-database. On Thu, Sep 16, 2010 at 9:41 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On 09/16/2010 09:25 AM, Scott Carey wrote: > >> Generally, Avro recommends storing the schema with the data. For a >> file that means in the header of the file, for a key/value store that >> means in some system metadata. Any individual store can only keep >> data serialized with one schema. >> > > Another good pattern is to store the hashcode of the writer's schema with > each written instance, then keep written schemas in a separate store, keyed > by hashcode. For example, Sam Pullara's done this in his HAvroBase: > > > http://www.javarants.com/2010/06/30/havrobase-a-searchable-evolvable-entity-store-on-top-of-hbase-and-solr/> > Doug >
|
|