|
Alex Holmes
2011-09-19, 12:12
Chris Wilkes
2011-09-19, 15:22
Doug Cutting
2011-09-19, 16:35
Scott Carey
2011-09-19, 18:16
Rohini U
2011-09-19, 18:23
Scott Carey
2011-09-19, 18:31
Alex Holmes
2011-09-20, 01:08
Alex Holmes
2011-09-20, 01:14
Scott Carey
2011-09-20, 06:22
Alex Holmes
2011-09-20, 10:26
Alex Holmes
2011-09-20, 10:44
Scott Carey
2011-09-20, 18:51
Alex Holmes
2011-09-21, 23:55
|
-
Avro versioning and SpecificDatum'sAlex Holmes 2011-09-19, 12:12
Hi,
I'm starting to play with how I can support versioning with Avro. I created an initial schema, code-generated some some Java classes using "org.apache.avro.tool.Main compile protocol", and then used the DataFileWriter (with a SpecificDatumWriter) to serialize my objects to a file. I then modified my original schema by adding, deleting and renaming some fields, creating version 2 of the schema. After re-creating the Java classes I attempted to read the version 1 file using the DataFileStream (with a SpecificDatumReader), and this is throwing an exception. Is versioning supported in conjunction with the SpecificDatum* reader/writer classes, or do I have to work at the GenericDatum level for this to work? Many thanks, Alex
-
Re: Avro versioning and SpecificDatum'sChris Wilkes 2011-09-19, 15:22
I'm interested in this as well, for now I've put my versioning in the
package namespace of avro definition, ie: com.example.avro.v1.Car com.example.avro.v2.Car After all my documents that had the v1.Car have been reprocessed and are out of use I delete the old definition. Chris On Mon, Sep 19, 2011 at 5:12 AM, Alex Holmes <[EMAIL PROTECTED]> wrote: > Hi, > > I'm starting to play with how I can support versioning with Avro. I > created an initial schema, code-generated some some Java classes using > "org.apache.avro.tool.Main compile protocol", and then used the > DataFileWriter (with a SpecificDatumWriter) to serialize my objects to > a file. > > I then modified my original schema by adding, deleting and renaming > some fields, creating version 2 of the schema. After re-creating the > Java classes I attempted to read the version 1 file using the > DataFileStream (with a SpecificDatumReader), and this is throwing an > exception. > > Is versioning supported in conjunction with the SpecificDatum* > reader/writer classes, or do I have to work at the GenericDatum level > for this to work? > > Many thanks, > Alex >
-
Re: Avro versioning and SpecificDatum'sDoug Cutting 2011-09-19, 16:35
On 09/19/2011 05:12 AM, Alex Holmes wrote:
> I then modified my original schema by adding, deleting and renaming > some fields, creating version 2 of the schema. After re-creating the > Java classes I attempted to read the version 1 file using the > DataFileStream (with a SpecificDatumReader), and this is throwing an > exception. This should work. Can you provide more detail? What is the exception? A reproducible test case would be great to have. Thanks, Doug
-
Re: Avro versioning and SpecificDatum'sScott Carey 2011-09-19, 18:16
I version with SpecificDatum objects using avro data files and it works
fine. I have seen problems arise if a user is configuring or reconfiguring the schemas on the DatumReader passed into the construction of the DataFileReader. In the case of SpecificDatumReader, it is as simple as: DatumReader<T> reader = new SpecificDatumReader<T>(T.class); DataFileReader<T> fileReader = new DataFileReader(file, reader); On 9/19/11 5:12 AM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: >Hi, > >I'm starting to play with how I can support versioning with Avro. I >created an initial schema, code-generated some some Java classes using >"org.apache.avro.tool.Main compile protocol", and then used the >DataFileWriter (with a SpecificDatumWriter) to serialize my objects to >a file. > >I then modified my original schema by adding, deleting and renaming >some fields, creating version 2 of the schema. After re-creating the >Java classes I attempted to read the version 1 file using the >DataFileStream (with a SpecificDatumReader), and this is throwing an >exception. > >Is versioning supported in conjunction with the SpecificDatum* >reader/writer classes, or do I have to work at the GenericDatum level >for this to work? > >Many thanks, >Alex
-
Re: Avro versioning and SpecificDatum'sRohini U 2011-09-19, 18:23
I have also seen this issue when I write an avro object using
SpecificDatumWriter and read it back using SpecificDatumReader, it complains saying that the schemas do not match even though I specify reader and writer schemas. On Mon, Sep 19, 2011 at 11:16 AM, Scott Carey <[EMAIL PROTECTED]> wrote: > I version with SpecificDatum objects using avro data files and it works > fine. > > I have seen problems arise if a user is configuring or reconfiguring the > schemas on the DatumReader passed into the construction of the > DataFileReader. > > > In the case of SpecificDatumReader, it is as simple as: > > DatumReader<T> reader = new SpecificDatumReader<T>(T.class); > DataFileReader<T> fileReader = new DataFileReader(file, reader); > > > > On 9/19/11 5:12 AM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: > > >Hi, > > > >I'm starting to play with how I can support versioning with Avro. I > >created an initial schema, code-generated some some Java classes using > >"org.apache.avro.tool.Main compile protocol", and then used the > >DataFileWriter (with a SpecificDatumWriter) to serialize my objects to > >a file. > > > >I then modified my original schema by adding, deleting and renaming > >some fields, creating version 2 of the schema. After re-creating the > >Java classes I attempted to read the version 1 file using the > >DataFileStream (with a SpecificDatumReader), and this is throwing an > >exception. > > > >Is versioning supported in conjunction with the SpecificDatum* > >reader/writer classes, or do I have to work at the GenericDatum level > >for this to work? > > > >Many thanks, > >Alex > > > -- Regards -Rohini -- ** People of accomplishment rarely sat back & let things happen to them. They went out & happened to things - Leonardo Da Vinci
-
Re: Avro versioning and SpecificDatum'sScott Carey 2011-09-19, 18:31
What if you don't specify the schemas?
The writer schema is in the data file, and configured automatically if unset. The reader schema is in the class, and configured automatically in the SpecificDatumReader constructor. On 9/19/11 11:23 AM, "Rohini U" <[EMAIL PROTECTED]> wrote: > I have also seen this issue when I write an avro object using > SpecificDatumWriter and > read it back using SpecificDatumReader, it complains saying that the schemas > do not match even though I specify reader and writer schemas. > > > On Mon, Sep 19, 2011 at 11:16 AM, Scott Carey <[EMAIL PROTECTED]> wrote: >> I version with SpecificDatum objects using avro data files and it works >> fine. >> >> I have seen problems arise if a user is configuring or reconfiguring the >> schemas on the DatumReader passed into the construction of the >> DataFileReader. >> >> >> In the case of SpecificDatumReader, it is as simple as: >> >> DatumReader<T> reader = new SpecificDatumReader<T>(T.class); >> DataFileReader<T> fileReader = new DataFileReader(file, reader); >> >> >> >> On 9/19/11 5:12 AM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: >> >>> >Hi, >>> > >>> >I'm starting to play with how I can support versioning with Avro. I >>> >created an initial schema, code-generated some some Java classes using >>> >"org.apache.avro.tool.Main compile protocol", and then used the >>> >DataFileWriter (with a SpecificDatumWriter) to serialize my objects to >>> >a file. >>> > >>> >I then modified my original schema by adding, deleting and renaming >>> >some fields, creating version 2 of the schema. After re-creating the >>> >Java classes I attempted to read the version 1 file using the >>> >DataFileStream (with a SpecificDatumReader), and this is throwing an >>> >exception. >>> > >>> >Is versioning supported in conjunction with the SpecificDatum* >>> >reader/writer classes, or do I have to work at the GenericDatum level >>> >for this to work? >>> > >>> >Many thanks, >>> >Alex >> >> > > > > -- > Regards > -Rohini > > -- > > People of accomplishment rarely sat back & let things happen to them. They > went out & happened to things - Leonardo Da Vinci > >
-
Re: Avro versioning and SpecificDatum'sAlex Holmes 2011-09-20, 01:08
I'm trying to put together a simple test case to reproduce the
exception. While I was creating the test case, I hit this behavior which doesn't seem right, but maybe it's my misunderstanding on how forward/backward compatibility should work: Schema v1: {"name": "Record", "type": "record", "fields": [ {"name": "name", "type": "string"}, {"name": "id", "type": "int"} ] } Schema v2: {"name": "Record", "type": "record", "fields": [ {"name": "name_rename", "type": "string", "aliases": ["name"]}, {"name": "new_field", "type": "int", "default":"0"} ] } In the 2nd version I: - removed field "id" - renamed field "name" to "name_rename" - added field "new_field" I write the v1 data file: public static Record createRecord(String name, int id) { Record record = new Record(); record.name = name; record.id = id; return record; } public static void writeToAvro(OutputStream outputStream) throws IOException { DataFileWriter<Record> writer new DataFileWriter<Record>(new SpecificDatumWriter<Record>()); writer.create(Record.SCHEMA$, outputStream); writer.append(createRecord("r1", 1)); writer.append(createRecord("r2", 2)); writer.close(); outputStream.close(); } I wrote a version-agnostic Read class: public static void readFromAvro(InputStream is) throws IOException { DataFileStream<Record> reader = new DataFileStream<Record>( is, new SpecificDatumReader<Record>()); for (Record a : reader) { System.out.println(ToStringBuilder.reflectionToString(a)); } IOUtils.cleanup(null, is); IOUtils.cleanup(null, reader); } Running the Read code against the v1 data file, and including the v1 code-generated classes in the classpath produced: Record@6a8c436b[name=r1,id=1] Record@6baa9f99[name=r2,id=2] If I run the same code, but use just the v2 generated classes in the classpath I get: Record@39dd3812[name_rename=r1,new_field=1] Record@27b15692[name_rename=r2,new_field=2] The name_rename field seems to be good, but why would "new_field" inherit the values of the deleted field "id"? Cheers, Alex On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On 09/19/2011 05:12 AM, Alex Holmes wrote: >> I then modified my original schema by adding, deleting and renaming >> some fields, creating version 2 of the schema. After re-creating the >> Java classes I attempted to read the version 1 file using the >> DataFileStream (with a SpecificDatumReader), and this is throwing an >> exception. > > This should work. Can you provide more detail? What is the exception? > A reproducible test case would be great to have. > > Thanks, > > Doug >
-
Re: Avro versioning and SpecificDatum'sAlex Holmes 2011-09-20, 01:14
OK, I was able to reproduce the exception.
v1: {"name": "Record", "type": "record", "fields": [ {"name": "name", "type": "string"}, {"name": "id", "type": "int"} ] } v2: {"name": "Record", "type": "record", "fields": [ {"name": "name_rename", "type": "string", "aliases": ["name"]} ] } Step 1. Write Avro file using v1 generated class Step 2. Read Avro file using v2 generated class Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index at Record.put(Unknown Source) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at Read.readFromAvro(Unknown Source) at Read.main(Unknown Source) The code to write/read the avro file didn't change from below. On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]> wrote: > I'm trying to put together a simple test case to reproduce the > exception. While I was creating the test case, I hit this behavior > which doesn't seem right, but maybe it's my misunderstanding on how > forward/backward compatibility should work: > > Schema v1: > > {"name": "Record", "type": "record", > "fields": [ > {"name": "name", "type": "string"}, > {"name": "id", "type": "int"} > ] > } > > Schema v2: > > {"name": "Record", "type": "record", > "fields": [ > {"name": "name_rename", "type": "string", "aliases": ["name"]}, > {"name": "new_field", "type": "int", "default":"0"} > ] > } > > In the 2nd version I: > > - removed field "id" > - renamed field "name" to "name_rename" > - added field "new_field" > > I write the v1 data file: > > public static Record createRecord(String name, int id) { > Record record = new Record(); > record.name = name; > record.id = id; > return record; > } > > public static void writeToAvro(OutputStream outputStream) > throws IOException { > DataFileWriter<Record> writer > new DataFileWriter<Record>(new SpecificDatumWriter<Record>()); > writer.create(Record.SCHEMA$, outputStream); > > writer.append(createRecord("r1", 1)); > writer.append(createRecord("r2", 2)); > > writer.close(); > outputStream.close(); > } > > I wrote a version-agnostic Read class: > > public static void readFromAvro(InputStream is) throws IOException { > DataFileStream<Record> reader = new DataFileStream<Record>( > is, new SpecificDatumReader<Record>()); > for (Record a : reader) { > System.out.println(ToStringBuilder.reflectionToString(a)); > } > IOUtils.cleanup(null, is); > IOUtils.cleanup(null, reader); > } > > Running the Read code against the v1 data file, and including the v1 > code-generated classes in the classpath produced: > > Record@6a8c436b[name=r1,id=1] > Record@6baa9f99[name=r2,id=2] > > If I run the same code, but use just the v2 generated classes in the > classpath I get: > > Record@39dd3812[name_rename=r1,new_field=1] > Record@27b15692[name_rename=r2,new_field=2] > > The name_rename field seems to be good, but why would "new_field" > inherit the values of the deleted field "id"? > > Cheers, > Alex > > > > > > > > On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> On 09/19/2011 05:12 AM, Alex Holmes wrote: >>> I then modified my original schema by adding, deleting and renaming >>> some fields, creating version 2 of the schema. After re-creating the >>> Java classes I attempted to read the version 1 file using the >>> DataFileStream (with a SpecificDatumReader), and this is throwing an >>> exception. >> >> This should work. Can you provide more detail? What is the exception? >> A reproducible test case would be great to have. >
-
Re: Avro versioning and SpecificDatum'sScott Carey 2011-09-20, 06:22
That looks like a bug. What happens if there is no aliasing/renaming
involved? Aliasing is a newer feature than field addition, removal, and promotion. This should be easy to reproduce, can you file a JIRA ticket? We should discuss this further there. Thanks! On 9/19/11 6:14 PM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: >OK, I was able to reproduce the exception. > >v1: >{"name": "Record", "type": "record", > "fields": [ > {"name": "name", "type": "string"}, > {"name": "id", "type": "int"} > ] >} > >v2: >{"name": "Record", "type": "record", > "fields": [ > {"name": "name_rename", "type": "string", "aliases": ["name"]} > ] >} > >Step 1. Write Avro file using v1 generated class >Step 2. Read Avro file using v2 generated class > >Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index > at Record.put(Unknown Source) > at org.apache.avro.generic.GenericData.setField(GenericData.java:463) > at >org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j >ava:166) > at >org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13 >8) > at >org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12 >9) > at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) > at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) > at Read.readFromAvro(Unknown Source) > at Read.main(Unknown Source) > >The code to write/read the avro file didn't change from below. > >On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]> wrote: >> I'm trying to put together a simple test case to reproduce the >> exception. While I was creating the test case, I hit this behavior >> which doesn't seem right, but maybe it's my misunderstanding on how >> forward/backward compatibility should work: >> >> Schema v1: >> >> {"name": "Record", "type": "record", >> "fields": [ >> {"name": "name", "type": "string"}, >> {"name": "id", "type": "int"} >> ] >> } >> >> Schema v2: >> >> {"name": "Record", "type": "record", >> "fields": [ >> {"name": "name_rename", "type": "string", "aliases": ["name"]}, >> {"name": "new_field", "type": "int", "default":"0"} >> ] >> } >> >> In the 2nd version I: >> >> - removed field "id" >> - renamed field "name" to "name_rename" >> - added field "new_field" >> >> I write the v1 data file: >> >> public static Record createRecord(String name, int id) { >> Record record = new Record(); >> record.name = name; >> record.id = id; >> return record; >> } >> >> public static void writeToAvro(OutputStream outputStream) >> throws IOException { >> DataFileWriter<Record> writer >> new DataFileWriter<Record>(new SpecificDatumWriter<Record>()); >> writer.create(Record.SCHEMA$, outputStream); >> >> writer.append(createRecord("r1", 1)); >> writer.append(createRecord("r2", 2)); >> >> writer.close(); >> outputStream.close(); >> } >> >> I wrote a version-agnostic Read class: >> >> public static void readFromAvro(InputStream is) throws IOException { >> DataFileStream<Record> reader = new DataFileStream<Record>( >> is, new SpecificDatumReader<Record>()); >> for (Record a : reader) { >> System.out.println(ToStringBuilder.reflectionToString(a)); >> } >> IOUtils.cleanup(null, is); >> IOUtils.cleanup(null, reader); >> } >> >> Running the Read code against the v1 data file, and including the v1 >> code-generated classes in the classpath produced: >> >> Record@6a8c436b[name=r1,id=1] >> Record@6baa9f99[name=r2,id=2] >> >> If I run the same code, but use just the v2 generated classes in the >> classpath I get: >> >> Record@39dd3812[name_rename=r1,new_field=1] >> Record@27b15692[name_rename=r2,new_field=2] >> >> The name_rename field seems to be good, but why would "new_field" >> inherit the values of the deleted field "id"? >> >> Cheers, >> Alex >> >> >> >> >> >> >> >> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <[EMAIL PROTECTED]> >>wrote: >>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
-
Re: Avro versioning and SpecificDatum'sAlex Holmes 2011-09-20, 10:26
Thanks, I'll add a bug.
As a FYI, even without the alias (retaining the original field name), just removing the "id" field yields the exception. On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <[EMAIL PROTECTED]> wrote: > That looks like a bug. What happens if there is no aliasing/renaming > involved? Aliasing is a newer feature than field addition, removal, and > promotion. > > This should be easy to reproduce, can you file a JIRA ticket? We should > discuss this further there. > > Thanks! > > > On 9/19/11 6:14 PM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: > >>OK, I was able to reproduce the exception. >> >>v1: >>{"name": "Record", "type": "record", >> "fields": [ >> {"name": "name", "type": "string"}, >> {"name": "id", "type": "int"} >> ] >>} >> >>v2: >>{"name": "Record", "type": "record", >> "fields": [ >> {"name": "name_rename", "type": "string", "aliases": ["name"]} >> ] >>} >> >>Step 1. Write Avro file using v1 generated class >>Step 2. Read Avro file using v2 generated class >> >>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index >> at Record.put(Unknown Source) >> at org.apache.avro.generic.GenericData.setField(GenericData.java:463) >> at >>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j >>ava:166) >> at >>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13 >>8) >> at >>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12 >>9) >> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) >> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) >> at Read.readFromAvro(Unknown Source) >> at Read.main(Unknown Source) >> >>The code to write/read the avro file didn't change from below. >> >>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]> wrote: >>> I'm trying to put together a simple test case to reproduce the >>> exception. While I was creating the test case, I hit this behavior >>> which doesn't seem right, but maybe it's my misunderstanding on how >>> forward/backward compatibility should work: >>> >>> Schema v1: >>> >>> {"name": "Record", "type": "record", >>> "fields": [ >>> {"name": "name", "type": "string"}, >>> {"name": "id", "type": "int"} >>> ] >>> } >>> >>> Schema v2: >>> >>> {"name": "Record", "type": "record", >>> "fields": [ >>> {"name": "name_rename", "type": "string", "aliases": ["name"]}, >>> {"name": "new_field", "type": "int", "default":"0"} >>> ] >>> } >>> >>> In the 2nd version I: >>> >>> - removed field "id" >>> - renamed field "name" to "name_rename" >>> - added field "new_field" >>> >>> I write the v1 data file: >>> >>> public static Record createRecord(String name, int id) { >>> Record record = new Record(); >>> record.name = name; >>> record.id = id; >>> return record; >>> } >>> >>> public static void writeToAvro(OutputStream outputStream) >>> throws IOException { >>> DataFileWriter<Record> writer >>> new DataFileWriter<Record>(new SpecificDatumWriter<Record>()); >>> writer.create(Record.SCHEMA$, outputStream); >>> >>> writer.append(createRecord("r1", 1)); >>> writer.append(createRecord("r2", 2)); >>> >>> writer.close(); >>> outputStream.close(); >>> } >>> >>> I wrote a version-agnostic Read class: >>> >>> public static void readFromAvro(InputStream is) throws IOException { >>> DataFileStream<Record> reader = new DataFileStream<Record>( >>> is, new SpecificDatumReader<Record>()); >>> for (Record a : reader) { >>> System.out.println(ToStringBuilder.reflectionToString(a)); >>> } >>> IOUtils.cleanup(null, is); >>> IOUtils.cleanup(null, reader); >>> } >>> >>> Running the Read code against the v1 data file, and including the v1 >>> code-generated classes in the classpath produced: >>> >>> Record@6a8c436b[name=r1,id=1] >>> Record@6baa9f99[name=r2,id=2] >>> >>> If I run the same code, but use just the v2 generated classes in the
-
Re: Avro versioning and SpecificDatum'sAlex Holmes 2011-09-20, 10:44
Created the following ticket:
https://issues.apache.org/jira/browse/AVRO-891 Thanks, Alex On Tue, Sep 20, 2011 at 6:26 AM, Alex Holmes <[EMAIL PROTECTED]> wrote: > Thanks, I'll add a bug. > > As a FYI, even without the alias (retaining the original field name), > just removing the "id" field yields the exception. > > On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <[EMAIL PROTECTED]> wrote: >> That looks like a bug. What happens if there is no aliasing/renaming >> involved? Aliasing is a newer feature than field addition, removal, and >> promotion. >> >> This should be easy to reproduce, can you file a JIRA ticket? We should >> discuss this further there. >> >> Thanks! >> >> >> On 9/19/11 6:14 PM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: >> >>>OK, I was able to reproduce the exception. >>> >>>v1: >>>{"name": "Record", "type": "record", >>> "fields": [ >>> {"name": "name", "type": "string"}, >>> {"name": "id", "type": "int"} >>> ] >>>} >>> >>>v2: >>>{"name": "Record", "type": "record", >>> "fields": [ >>> {"name": "name_rename", "type": "string", "aliases": ["name"]} >>> ] >>>} >>> >>>Step 1. Write Avro file using v1 generated class >>>Step 2. Read Avro file using v2 generated class >>> >>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index >>> at Record.put(Unknown Source) >>> at org.apache.avro.generic.GenericData.setField(GenericData.java:463) >>> at >>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j >>>ava:166) >>> at >>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13 >>>8) >>> at >>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12 >>>9) >>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) >>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) >>> at Read.readFromAvro(Unknown Source) >>> at Read.main(Unknown Source) >>> >>>The code to write/read the avro file didn't change from below. >>> >>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]> wrote: >>>> I'm trying to put together a simple test case to reproduce the >>>> exception. While I was creating the test case, I hit this behavior >>>> which doesn't seem right, but maybe it's my misunderstanding on how >>>> forward/backward compatibility should work: >>>> >>>> Schema v1: >>>> >>>> {"name": "Record", "type": "record", >>>> "fields": [ >>>> {"name": "name", "type": "string"}, >>>> {"name": "id", "type": "int"} >>>> ] >>>> } >>>> >>>> Schema v2: >>>> >>>> {"name": "Record", "type": "record", >>>> "fields": [ >>>> {"name": "name_rename", "type": "string", "aliases": ["name"]}, >>>> {"name": "new_field", "type": "int", "default":"0"} >>>> ] >>>> } >>>> >>>> In the 2nd version I: >>>> >>>> - removed field "id" >>>> - renamed field "name" to "name_rename" >>>> - added field "new_field" >>>> >>>> I write the v1 data file: >>>> >>>> public static Record createRecord(String name, int id) { >>>> Record record = new Record(); >>>> record.name = name; >>>> record.id = id; >>>> return record; >>>> } >>>> >>>> public static void writeToAvro(OutputStream outputStream) >>>> throws IOException { >>>> DataFileWriter<Record> writer >>>> new DataFileWriter<Record>(new SpecificDatumWriter<Record>()); >>>> writer.create(Record.SCHEMA$, outputStream); >>>> >>>> writer.append(createRecord("r1", 1)); >>>> writer.append(createRecord("r2", 2)); >>>> >>>> writer.close(); >>>> outputStream.close(); >>>> } >>>> >>>> I wrote a version-agnostic Read class: >>>> >>>> public static void readFromAvro(InputStream is) throws IOException { >>>> DataFileStream<Record> reader = new DataFileStream<Record>( >>>> is, new SpecificDatumReader<Record>()); >>>> for (Record a : reader) { >>>> System.out.println(ToStringBuilder.reflectionToString(a)); >>>> } >>>> IOUtils.cleanup(null, is); >>>> IOUtils.cleanup(null, reader);
-
Re: Avro versioning and SpecificDatum'sScott Carey 2011-09-20, 18:51
As Doug mentioned in the ticket, the problem is likely:
new SpecificDatumReader<Record>() This should be new SpecificDatumReader<Record>(Record.class) Which sets the reader to resolve to the schema found in Record.class On 9/20/11 3:44 AM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: >Created the following ticket: > >https://issues.apache.org/jira/browse/AVRO-891 > >Thanks, >Alex > >On Tue, Sep 20, 2011 at 6:26 AM, Alex Holmes <[EMAIL PROTECTED]> wrote: >> Thanks, I'll add a bug. >> >> As a FYI, even without the alias (retaining the original field name), >> just removing the "id" field yields the exception. >> >> On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <[EMAIL PROTECTED]> >>wrote: >>> That looks like a bug. What happens if there is no aliasing/renaming >>> involved? Aliasing is a newer feature than field addition, removal, >>>and >>> promotion. >>> >>> This should be easy to reproduce, can you file a JIRA ticket? We >>>should >>> discuss this further there. >>> >>> Thanks! >>> >>> >>> On 9/19/11 6:14 PM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: >>> >>>>OK, I was able to reproduce the exception. >>>> >>>>v1: >>>>{"name": "Record", "type": "record", >>>> "fields": [ >>>> {"name": "name", "type": "string"}, >>>> {"name": "id", "type": "int"} >>>> ] >>>>} >>>> >>>>v2: >>>>{"name": "Record", "type": "record", >>>> "fields": [ >>>> {"name": "name_rename", "type": "string", "aliases": ["name"]} >>>> ] >>>>} >>>> >>>>Step 1. Write Avro file using v1 generated class >>>>Step 2. Read Avro file using v2 generated class >>>> >>>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad >>>>index >>>> at Record.put(Unknown Source) >>>> at >>>>org.apache.avro.generic.GenericData.setField(GenericData.java:463) >>>> at >>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade >>>>r.j >>>>ava:166) >>>> at >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java >>>>:13 >>>>8) >>>> at >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java >>>>:12 >>>>9) >>>> at >>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) >>>> at >>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) >>>> at Read.readFromAvro(Unknown Source) >>>> at Read.main(Unknown Source) >>>> >>>>The code to write/read the avro file didn't change from below. >>>> >>>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]> >>>>wrote: >>>>> I'm trying to put together a simple test case to reproduce the >>>>> exception. While I was creating the test case, I hit this behavior >>>>> which doesn't seem right, but maybe it's my misunderstanding on how >>>>> forward/backward compatibility should work: >>>>> >>>>> Schema v1: >>>>> >>>>> {"name": "Record", "type": "record", >>>>> "fields": [ >>>>> {"name": "name", "type": "string"}, >>>>> {"name": "id", "type": "int"} >>>>> ] >>>>> } >>>>> >>>>> Schema v2: >>>>> >>>>> {"name": "Record", "type": "record", >>>>> "fields": [ >>>>> {"name": "name_rename", "type": "string", "aliases": ["name"]}, >>>>> {"name": "new_field", "type": "int", "default":"0"} >>>>> ] >>>>> } >>>>> >>>>> In the 2nd version I: >>>>> >>>>> - removed field "id" >>>>> - renamed field "name" to "name_rename" >>>>> - added field "new_field" >>>>> >>>>> I write the v1 data file: >>>>> >>>>> public static Record createRecord(String name, int id) { >>>>> Record record = new Record(); >>>>> record.name = name; >>>>> record.id = id; >>>>> return record; >>>>> } >>>>> >>>>> public static void writeToAvro(OutputStream outputStream) >>>>> throws IOException { >>>>> DataFileWriter<Record> writer >>>>> new DataFileWriter<Record>(new SpecificDatumWriter<Record>()); >>>>> writer.create(Record.SCHEMA$, outputStream); >>>>> >>>>> writer.append(createRecord("r1", 1)); >>>>> writer.append(createRecord("r2", 2)); >>>>> >>>>> writer.close();
-
Re: Avro versioning and SpecificDatum'sAlex Holmes 2011-09-21, 23:55
Thanks, that fixed my issue.
On Tue, Sep 20, 2011 at 2:51 PM, Scott Carey <[EMAIL PROTECTED]> wrote: > As Doug mentioned in the ticket, the problem is likely: > > new SpecificDatumReader<Record>() > > > This should be > > new SpecificDatumReader<Record>(Record.class) > > > Which sets the reader to resolve to the schema found in Record.class > > > > On 9/20/11 3:44 AM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: > >>Created the following ticket: >> >>https://issues.apache.org/jira/browse/AVRO-891 >> >>Thanks, >>Alex >> >>On Tue, Sep 20, 2011 at 6:26 AM, Alex Holmes <[EMAIL PROTECTED]> wrote: >>> Thanks, I'll add a bug. >>> >>> As a FYI, even without the alias (retaining the original field name), >>> just removing the "id" field yields the exception. >>> >>> On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <[EMAIL PROTECTED]> >>>wrote: >>>> That looks like a bug. What happens if there is no aliasing/renaming >>>> involved? Aliasing is a newer feature than field addition, removal, >>>>and >>>> promotion. >>>> >>>> This should be easy to reproduce, can you file a JIRA ticket? We >>>>should >>>> discuss this further there. >>>> >>>> Thanks! >>>> >>>> >>>> On 9/19/11 6:14 PM, "Alex Holmes" <[EMAIL PROTECTED]> wrote: >>>> >>>>>OK, I was able to reproduce the exception. >>>>> >>>>>v1: >>>>>{"name": "Record", "type": "record", >>>>> "fields": [ >>>>> {"name": "name", "type": "string"}, >>>>> {"name": "id", "type": "int"} >>>>> ] >>>>>} >>>>> >>>>>v2: >>>>>{"name": "Record", "type": "record", >>>>> "fields": [ >>>>> {"name": "name_rename", "type": "string", "aliases": ["name"]} >>>>> ] >>>>>} >>>>> >>>>>Step 1. Write Avro file using v1 generated class >>>>>Step 2. Read Avro file using v2 generated class >>>>> >>>>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad >>>>>index >>>>> at Record.put(Unknown Source) >>>>> at >>>>>org.apache.avro.generic.GenericData.setField(GenericData.java:463) >>>>> at >>>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade >>>>>r.j >>>>>ava:166) >>>>> at >>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java >>>>>:13 >>>>>8) >>>>> at >>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java >>>>>:12 >>>>>9) >>>>> at >>>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) >>>>> at >>>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) >>>>> at Read.readFromAvro(Unknown Source) >>>>> at Read.main(Unknown Source) >>>>> >>>>>The code to write/read the avro file didn't change from below. >>>>> >>>>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]> >>>>>wrote: >>>>>> I'm trying to put together a simple test case to reproduce the >>>>>> exception. While I was creating the test case, I hit this behavior >>>>>> which doesn't seem right, but maybe it's my misunderstanding on how >>>>>> forward/backward compatibility should work: >>>>>> >>>>>> Schema v1: >>>>>> >>>>>> {"name": "Record", "type": "record", >>>>>> "fields": [ >>>>>> {"name": "name", "type": "string"}, >>>>>> {"name": "id", "type": "int"} >>>>>> ] >>>>>> } >>>>>> >>>>>> Schema v2: >>>>>> >>>>>> {"name": "Record", "type": "record", >>>>>> "fields": [ >>>>>> {"name": "name_rename", "type": "string", "aliases": ["name"]}, >>>>>> {"name": "new_field", "type": "int", "default":"0"} >>>>>> ] >>>>>> } >>>>>> >>>>>> In the 2nd version I: >>>>>> >>>>>> - removed field "id" >>>>>> - renamed field "name" to "name_rename" >>>>>> - added field "new_field" >>>>>> >>>>>> I write the v1 data file: >>>>>> >>>>>> public static Record createRecord(String name, int id) { >>>>>> Record record = new Record(); >>>>>> record.name = name; >>>>>> record.id = id; >>>>>> return record; >>>>>> } >>>>>> >>>>>> public static void writeToAvro(OutputStream outputStream) >>>>>> throws IOException { >>>>>> DataFileWriter<Record> writer >>>>>> new DataFileWriter<Record>(new SpecificDatumWriter<Record>()); |