Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Schema resolution failure when the writer's schema is a primitive type and the reader's schema is a union


Copy link to this message
-
Schema resolution failure when the writer's schema is a primitive type and the reader's schema is a union
Hey,
I've been running into this case where I have a field of type int but I need to allow for null values. To do this, I now have a new schema that defines that field as a union of null and int such as:
type: [ "null", "int" ]
According to my interpretation of the spec, avro should resolve this correctly. For reference, this reads like this (from http://avro.apache.org/docs/current/spec.html#Schema+Resolution):
> if reader's is a union, but writer's is not
> The first schema in the reader's union that matches the writer's schema is recursively resolved against it. If none match, an error is signaled.)
>
However, when trying to do this, I get this:
org.apache.avro.AvroTypeException: Attempt to process a int when a union was expected.
I've written a simple test that illustrates what I'm saying:

    @Test
    public void testReadingUnionFromValueWrittenAsPrimitive() throws Exception {
        Schema writerSchema = new Schema.Parser().parse("{\n" +
                "    \"type\":\"record\",\n" +
                "    \"name\":\"NeighborComparisons\",\n" +
                "    \"fields\": [\n" +
                "      {\"name\": \"test\",\n" +
                "      \"type\": \"int\" }]} ");
        Schema readersSchema = new Schema.Parser().parse(" {\n" +
                "    \"type\":\"record\",\n" +
                "    \"name\":\"NeighborComparisons\",\n" +
                "    \"fields\": [ {\n" +
                "      \"name\": \"test\",\n" +
                "      \"type\": [\"null\", \"int\"],\n" +
                "      \"default\": null } ]  }");
        GenericData.Record record = new GenericData.Record(writerSchema);
        record.put("test", Integer.valueOf(10));
        ByteArrayOutputStream output = new ByteArrayOutputStream();
        JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(writerSchema, output);
        GenericDatumWriter<GenericData.Record> writer = new GenericDatumWriter<GenericData.Record>(writerSchema);
        writer.write(record, jsonEncoder);
        jsonEncoder.flush();
        output.flush();
        System.out.println(output.toString());
        JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(readersSchema, output.toString());
        GenericDatumReader<GenericData.Record> reader

                new GenericDatumReader<GenericData.Record>(writerSchema, readersSchema);
        GenericData.Record read = reader.read(null, jsonDecoder);
        
        assertEquals(10, read.get("test"));
    }

Am I misunderstanding how avro should handle such a case of schema resolution or is the problem in the implementation?
Cheers!
--
Alex

+
Scott Carey 2012-08-31, 06:01
+
Alexandre Normand 2012-08-31, 06:06
+
Scott Carey 2012-08-31, 16:23
+
Doug Cutting 2012-08-31, 21:22
+
Alexandre Normand 2012-08-31, 21:31
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB