Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Avro versioning and SpecificDatum's


Copy link to this message
-
Re: Avro versioning and SpecificDatum's
Scott Carey 2011-09-20, 18:51
As Doug mentioned in the ticket, the problem is likely:

new SpecificDatumReader<Record>()
This should be

new SpecificDatumReader<Record>(Record.class)
Which sets the reader to resolve to the schema found in Record.class

On 9/20/11 3:44 AM, "Alex Holmes" <[EMAIL PROTECTED]> wrote:

>Created the following ticket:
>
>https://issues.apache.org/jira/browse/AVRO-891
>
>Thanks,
>Alex
>
>On Tue, Sep 20, 2011 at 6:26 AM, Alex Holmes <[EMAIL PROTECTED]> wrote:
>> Thanks, I'll add a bug.
>>
>> As a FYI, even without the alias (retaining the original field name),
>> just removing the "id" field yields the exception.
>>
>> On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <[EMAIL PROTECTED]>
>>wrote:
>>> That looks like a bug.  What happens if there is no aliasing/renaming
>>> involved?  Aliasing is a newer feature than field addition, removal,
>>>and
>>> promotion.
>>>
>>> This should be easy to reproduce, can you file a JIRA ticket?  We
>>>should
>>> discuss this further there.
>>>
>>> Thanks!
>>>
>>>
>>> On 9/19/11 6:14 PM, "Alex Holmes" <[EMAIL PROTECTED]> wrote:
>>>
>>>>OK, I was able to reproduce the exception.
>>>>
>>>>v1:
>>>>{"name": "Record", "type": "record",
>>>>  "fields": [
>>>>    {"name": "name", "type": "string"},
>>>>    {"name": "id", "type": "int"}
>>>>  ]
>>>>}
>>>>
>>>>v2:
>>>>{"name": "Record", "type": "record",
>>>>  "fields": [
>>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>>>>  ]
>>>>}
>>>>
>>>>Step 1.  Write Avro file using v1 generated class
>>>>Step 2.  Read Avro file using v2 generated class
>>>>
>>>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad
>>>>index
>>>>       at Record.put(Unknown Source)
>>>>       at
>>>>org.apache.avro.generic.GenericData.setField(GenericData.java:463)
>>>>       at
>>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
>>>>r.j
>>>>ava:166)
>>>>       at
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:13
>>>>8)
>>>>       at
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:12
>>>>9)
>>>>       at
>>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>>>>       at
>>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>>>>       at Read.readFromAvro(Unknown Source)
>>>>       at Read.main(Unknown Source)
>>>>
>>>>The code to write/read the avro file didn't change from below.
>>>>
>>>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]>
>>>>wrote:
>>>>> I'm trying to put together a simple test case to reproduce the
>>>>> exception.  While I was creating the test case, I hit this behavior
>>>>> which doesn't seem right, but maybe it's my misunderstanding on how
>>>>> forward/backward compatibility should work:
>>>>>
>>>>> Schema v1:
>>>>>
>>>>> {"name": "Record", "type": "record",
>>>>>  "fields": [
>>>>>    {"name": "name", "type": "string"},
>>>>>    {"name": "id", "type": "int"}
>>>>>  ]
>>>>> }
>>>>>
>>>>> Schema v2:
>>>>>
>>>>> {"name": "Record", "type": "record",
>>>>>  "fields": [
>>>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>>>>    {"name": "new_field", "type": "int", "default":"0"}
>>>>>  ]
>>>>> }
>>>>>
>>>>> In the 2nd version I:
>>>>>
>>>>> - removed field "id"
>>>>> - renamed field "name" to "name_rename"
>>>>> - added field "new_field"
>>>>>
>>>>> I write the v1 data file:
>>>>>
>>>>>  public static Record createRecord(String name, int id) {
>>>>>    Record record = new Record();
>>>>>    record.name = name;
>>>>>    record.id = id;
>>>>>    return record;
>>>>>  }
>>>>>
>>>>>  public static void writeToAvro(OutputStream outputStream)
>>>>>      throws IOException {
>>>>>    DataFileWriter<Record> writer >>>>>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>>>>    writer.create(Record.SCHEMA$, outputStream);
>>>>>
>>>>>    writer.append(createRecord("r1", 1));
>>>>>    writer.append(createRecord("r2", 2));
>>>>>
>>>>>    writer.close();