Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro versioning and SpecificDatum's


Copy link to this message
-
Re: Avro versioning and SpecificDatum's
That looks like a bug.  What happens if there is no aliasing/renaming
involved?  Aliasing is a newer feature than field addition, removal, and
promotion.

This should be easy to reproduce, can you file a JIRA ticket?  We should
discuss this further there.

Thanks!
On 9/19/11 6:14 PM, "Alex Holmes" <[EMAIL PROTECTED]> wrote:

>OK, I was able to reproduce the exception.
>
>v1:
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name", "type": "string"},
>    {"name": "id", "type": "int"}
>  ]
>}
>
>v2:
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>  ]
>}
>
>Step 1.  Write Avro file using v1 generated class
>Step 2.  Read Avro file using v2 generated class
>
>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
> at Record.put(Unknown Source)
> at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
> at
>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j
>ava:166)
> at
>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
>8)
> at
>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
>9)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
> at Read.readFromAvro(Unknown Source)
> at Read.main(Unknown Source)
>
>The code to write/read the avro file didn't change from below.
>
>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]> wrote:
>> I'm trying to put together a simple test case to reproduce the
>> exception.  While I was creating the test case, I hit this behavior
>> which doesn't seem right, but maybe it's my misunderstanding on how
>> forward/backward compatibility should work:
>>
>> Schema v1:
>>
>> {"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name", "type": "string"},
>>    {"name": "id", "type": "int"}
>>  ]
>> }
>>
>> Schema v2:
>>
>> {"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>    {"name": "new_field", "type": "int", "default":"0"}
>>  ]
>> }
>>
>> In the 2nd version I:
>>
>> - removed field "id"
>> - renamed field "name" to "name_rename"
>> - added field "new_field"
>>
>> I write the v1 data file:
>>
>>  public static Record createRecord(String name, int id) {
>>    Record record = new Record();
>>    record.name = name;
>>    record.id = id;
>>    return record;
>>  }
>>
>>  public static void writeToAvro(OutputStream outputStream)
>>      throws IOException {
>>    DataFileWriter<Record> writer >>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>    writer.create(Record.SCHEMA$, outputStream);
>>
>>    writer.append(createRecord("r1", 1));
>>    writer.append(createRecord("r2", 2));
>>
>>    writer.close();
>>    outputStream.close();
>>  }
>>
>> I wrote a version-agnostic Read class:
>>
>>  public static void readFromAvro(InputStream is) throws IOException {
>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>            is, new SpecificDatumReader<Record>());
>>    for (Record a : reader) {
>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>    }
>>    IOUtils.cleanup(null, is);
>>    IOUtils.cleanup(null, reader);
>>  }
>>
>> Running the Read code against the v1 data file, and including the v1
>> code-generated classes in the classpath produced:
>>
>> Record@6a8c436b[name=r1,id=1]
>> Record@6baa9f99[name=r2,id=2]
>>
>> If I run the same code, but use just the v2 generated classes in the
>> classpath I get:
>>
>> Record@39dd3812[name_rename=r1,new_field=1]
>> Record@27b15692[name_rename=r2,new_field=2]
>>
>> The name_rename field seems to be good, but why would "new_field"
>> inherit the values of the deleted field "id"?
>>
>> Cheers,
>> Alex
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <[EMAIL PROTECTED]>
>>wrote:
>>> On 09/19/2011 05:12 AM, Alex Holmes wrote: