Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Schema resolution failure when the writer's schema is a primitive type and the reader's schema is a union


Copy link to this message
-
Re: Schema resolution failure when the writer's schema is a primitive type and the reader's schema is a union
My understanding of the spec is that promotion to a union should work as
long as the prior type is a member of the union.

What happens if the union in the reader schema union order is reversed?

This may be a bug.

-Scott

On 8/16/12 5:59 PM, "Alexandre Normand" <[EMAIL PROTECTED]>
wrote:
>Hey,
>I've been running into this case where I have a field of type int but I
>need to allow for null values. To do this, I now have a new schema that
>defines that field as a union of
>null and int such as:
>type: [ "null", "int" ]
>According to my interpretation of the spec, avro should resolve this
>correctly. For reference, this reads like this (from
>http://avro.apache.org/docs/current/spec.html#Schema+Resolution):
>
>if
> reader's is a union, but writer's is not
>The first schema in the reader's union that matches the writer's schema
>is recursively resolved against it. If none match, an error is signaled.)
>
>
>However, when trying to do this, I get this:
>org.apache.avro.AvroTypeException: Attempt to process a int when a union
>was expected.
>
>I've written a simple test that illustrates what I'm saying:
>    @Test
>    public void testReadingUnionFromValueWrittenAsPrimitive() throws
>Exception {
>        Schema writerSchema = new Schema.Parser().parse("{\n" +
>                "    \"type\":\"record\",\n" +
>                "    \"name\":\"NeighborComparisons\",\n" +
>                "    \"fields\": [\n" +
>                "      {\"name\": \"test\",\n" +
>                "      \"type\": \"int\" }]} ");
>        Schema readersSchema = new Schema.Parser().parse(" {\n" +
>                "    \"type\":\"record\",\n" +
>                "    \"name\":\"NeighborComparisons\",\n" +
>                "    \"fields\": [ {\n" +
>                "      \"name\": \"test\",\n" +
>                "      \"type\": [\"null\", \"int\"],\n" +
>                "      \"default\": null } ]  }");
>        GenericData.Record record = new GenericData.Record(writerSchema);
>        record.put("test", Integer.valueOf(10));
>
>        ByteArrayOutputStream output = new ByteArrayOutputStream();
>        JsonEncoder jsonEncoder >EncoderFactory.get().jsonEncoder(writerSchema, output);
>        GenericDatumWriter<GenericData.Record> writer = new
>GenericDatumWriter<GenericData.Record>(writerSchema);
>        writer.write(record, jsonEncoder);
>        jsonEncoder.flush();
>        output.flush();
>
>        System.out.println(output.toString());
>
>        JsonDecoder jsonDecoder >DecoderFactory.get().jsonDecoder(readersSchema, output.toString());
>        GenericDatumReader<GenericData.Record> reader >                new GenericDatumReader<GenericData.Record>(writerSchema,
>readersSchema);
>        GenericData.Record read = reader.read(null, jsonDecoder);
>        
>        assertEquals(10, read.get("test"));
>    }
>
>Am I misunderstanding how avro should handle such a case of schema
>resolution or is the problem in the implementation?
>
>Cheers!
>
>--
>Alex
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB