Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Schema resolution failure when the writer's schema is a primitive type and the reader's schema is a union


+
Alexandre Normand 2012-08-17, 00:59
+
Scott Carey 2012-08-31, 06:01
+
Alexandre Normand 2012-08-31, 06:06
+
Scott Carey 2012-08-31, 16:23
+
Doug Cutting 2012-08-31, 21:22
Copy link to this message
-
Re: Schema resolution failure when the writer's schema is a primitive type and the reader's schema is a union
It makes sense when you think about it. I guess making that parameter name writerSchema rather than the generic schema name would have been even better but at least I know now.

Thanks Doug!

--
Alex
On Friday, August 31, 2012 at 2:22 PM, Doug Cutting wrote:

> I responded to the Jira, but will respond here too for completeness.
>
> I believe the problem is that the decoder is incorrectly constructed
> with the reader's schema rather than the writer's schema. It should
> instead be constructed in this example with:
>
> JsonDecoder jsonDecoder > DecoderFactory.get().jsonDecoder(writerSchema, output.toString());
>
> With that change this test passes for me.
>
> Doug
>
> On Fri, Aug 31, 2012 at 9:23 AM, Scott Carey <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Yes, please file a bug in JIRA. It will get more attention there.
> >
> > On 8/30/12 11:06 PM, "Alexandre Normand" <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>
> > wrote:
> >
> > That's one of the things I've tried already. I've reversed the order to
> > ["int", "null"] but I get the same result.
> >
> > Should I file a bug in Jira?
> >
> > --
> > Alex
> >
> > On Thursday, August 30, 2012 at 11:01 PM, Scott Carey wrote:
> >
> > My understanding of the spec is that promotion to a union should work as
> > long as the prior type is a member of the union.
> >
> > What happens if the union in the reader schema union order is reversed?
> >
> > This may be a bug.
> >
> > -Scott
> >
> > On 8/16/12 5:59 PM, "Alexandre Normand" <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>
> > wrote:
> >
> >
> > Hey,
> > I've been running into this case where I have a field of type int but I
> > need to allow for null values. To do this, I now have a new schema that
> > defines that field as a union of
> > null and int such as:
> > type: [ "null", "int" ]
> > According to my interpretation of the spec, avro should resolve this
> > correctly. For reference, this reads like this (from
> > http://avro.apache.org/docs/current/spec.html#Schema+Resolution):
> >
> > if
> > reader's is a union, but writer's is not
> > The first schema in the reader's union that matches the writer's schema
> > is recursively resolved against it. If none match, an error is signaled.)
> >
> >
> > However, when trying to do this, I get this:
> > org.apache.avro.AvroTypeException: Attempt to process a int when a union
> > was expected.
> >
> > I've written a simple test that illustrates what I'm saying:
> > @Test
> > public void testReadingUnionFromValueWrittenAsPrimitive() throws
> > Exception {
> > Schema writerSchema = new Schema.Parser().parse("{\n" +
> > " \"type\":\"record\",\n" +
> > " \"name\":\"NeighborComparisons\",\n" +
> > " \"fields\": [\n" +
> > " {\"name\": \"test\",\n" +
> > " \"type\": \"int\" }]} ");
> > Schema readersSchema = new Schema.Parser().parse(" {\n" +
> > " \"type\":\"record\",\n" +
> > " \"name\":\"NeighborComparisons\",\n" +
> > " \"fields\": [ {\n" +
> > " \"name\": \"test\",\n" +
> > " \"type\": [\"null\", \"int\"],\n" +
> > " \"default\": null } ] }");
> > GenericData.Record record = new GenericData.Record(writerSchema);
> > record.put("test", Integer.valueOf(10));
> >
> > ByteArrayOutputStream output = new ByteArrayOutputStream();
> > JsonEncoder jsonEncoder > > EncoderFactory.get().jsonEncoder(writerSchema, output);
> > GenericDatumWriter<GenericData.Record> writer = new
> > GenericDatumWriter<GenericData.Record>(writerSchema);
> > writer.write(record, jsonEncoder);
> > jsonEncoder.flush();
> > output.flush();
> >
> > System.out.println(output.toString());
> >
> > JsonDecoder jsonDecoder > > DecoderFactory.get().jsonDecoder(readersSchema, output.toString());
> > GenericDatumReader<GenericData.Record> reader > > new GenericDatumReader<GenericData.Record>(writerSchema,
> > readersSchema);
> > GenericData.Record read = reader.read(null, jsonDecoder);
> > assertEquals(10, read.get("test"));
> > }
> >
> > Am I misunderstanding how avro should handle such a case of schema