Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro versioning and SpecificDatum's


Copy link to this message
-
Re: Avro versioning and SpecificDatum's
That looks like a bug.  What happens if there is no aliasing/renaming
involved?  Aliasing is a newer feature than field addition, removal, and
promotion.

This should be easy to reproduce, can you file a JIRA ticket?  We should
discuss this further there.

Thanks!
On 9/19/11 6:14 PM, "Alex Holmes" <[EMAIL PROTECTED]> wrote:

>OK, I was able to reproduce the exception.
>
>v1:
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name", "type": "string"},
>    {"name": "id", "type": "int"}
>  ]
>}
>
>v2:
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>  ]
>}
>
>Step 1.  Write Avro file using v1 generated class
>Step 2.  Read Avro file using v2 generated class
>
>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
> at Record.put(Unknown Source)
> at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
> at
>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j
>ava:166)
> at
>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
>8)
> at
>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
>9)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
> at Read.readFromAvro(Unknown Source)
> at Read.main(Unknown Source)
>
>The code to write/read the avro file didn't change from below.
>
>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[EMAIL PROTECTED]> wrote:
>> I'm trying to put together a simple test case to reproduce the
>> exception.  While I was creating the test case, I hit this behavior
>> which doesn't seem right, but maybe it's my misunderstanding on how
>> forward/backward compatibility should work:
>>
>> Schema v1:
>>
>> {"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name", "type": "string"},
>>    {"name": "id", "type": "int"}
>>  ]
>> }
>>
>> Schema v2:
>>
>> {"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>    {"name": "new_field", "type": "int", "default":"0"}
>>  ]
>> }
>>
>> In the 2nd version I:
>>
>> - removed field "id"
>> - renamed field "name" to "name_rename"
>> - added field "new_field"
>>
>> I write the v1 data file:
>>
>>  public static Record createRecord(String name, int id) {
>>    Record record = new Record();
>>    record.name = name;
>>    record.id = id;
>>    return record;
>>  }
>>
>>  public static void writeToAvro(OutputStream outputStream)
>>      throws IOException {
>>    DataFileWriter<Record> writer >>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>    writer.create(Record.SCHEMA$, outputStream);
>>
>>    writer.append(createRecord("r1", 1));
>>    writer.append(createRecord("r2", 2));
>>
>>    writer.close();
>>    outputStream.close();
>>  }
>>
>> I wrote a version-agnostic Read class:
>>
>>  public static void readFromAvro(InputStream is) throws IOException {
>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>            is, new SpecificDatumReader<Record>());
>>    for (Record a : reader) {
>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>    }
>>    IOUtils.cleanup(null, is);
>>    IOUtils.cleanup(null, reader);
>>  }
>>
>> Running the Read code against the v1 data file, and including the v1
>> code-generated classes in the classpath produced:
>>
>> Record@6a8c436b[name=r1,id=1]
>> Record@6baa9f99[name=r2,id=2]
>>
>> If I run the same code, but use just the v2 generated classes in the
>> classpath I get:
>>
>> Record@39dd3812[name_rename=r1,new_field=1]
>> Record@27b15692[name_rename=r2,new_field=2]
>>
>> The name_rename field seems to be good, but why would "new_field"
>> inherit the values of the deleted field "id"?
>>
>> Cheers,
>> Alex
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <[EMAIL PROTECTED]>
>>wrote:
>>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB