Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Picking up default value for a union?


+
Jonathan Coveney 2013-04-09, 09:06
+
Jonathan Coveney 2013-04-09, 09:31
+
Jonathan Coveney 2013-04-09, 09:44
+
Martin Kleppmann 2013-04-10, 03:42
Copy link to this message
-
Re: Picking up default value for a union?
Scott Carey 2013-04-11, 04:21
Minor addition, the default value should be

null

not

"null"

-- the latter is a string, the former is null.

http://avro.apache.org/docs/current/spec.html#schema_record
On 4/9/13 8:42 PM, "Martin Kleppmann" <[EMAIL PROTECTED]> wrote:

>With Avro, it is generally assumed that your reader is working with
>the exact same schema as the data was written with. If you want to
>change your schema, e.g. add a field to a record, you still need the
>exact same schema as was used for writing (the "writer's schema"), but
>you can also give the decoder a second schema (the "reader's schema"),
>and Avro will map data from the writer's schema into the reader's
>schema for you ("schema evolution").
>
>This requirement of having the exact same schema as the writer makes
>more sense with Avro's binary encoding, because it allows Avro to omit
>the field names, which makes the encoding very compact. The
>requirement makes less sense if you're using the JSON encoding, where
>field names are inevitably part of the JSON. I think this behaviour is
>expected, but I agree that it's a bit surprising, so perhaps it's
>worth discussing whether we should change it.
>
>To answer your question, your input data {} looks like it was written
>with a writer schema of {"name":"hey", "type":"record", "fields":[]}
>so try using that as your writer schema. Then if you specify
>{"name":"hey", "type":"record",
>"fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
>your reader schema, you should find that the resolving decoder fills
>in the field "a" with the default null.
>
>Best,
>Martin
>
>On 9 April 2013 02:44, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
>> Stepping through the code, it looks like the code only uses defaults for
>> writing, not for reading. IE at read time it assumes that the defaults
>>were
>> already filled in. It seems like if the reader evolved the schema to
>>include
>> new fields, it would be desirable for the defaults to get filled in if
>>not
>> present? But stepping through, on reading the defaults are completely
>> ignored.
>>
>>
>> 2013/4/9 Jonathan Coveney <[EMAIL PROTECTED]>
>>>
>>> Please note: {"name":"hey", "type":"record",
>>> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
>>> doesn't work
>>>
>>>
>>> 2013/4/9 Jonathan Coveney <[EMAIL PROTECTED]>
>>>>
>>>> I have the following schema: {"name":"hey", "type":"record",
>>>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
>>>>
>>>> I am trying to deserialize the following against this schema using
>>>>Java
>>>> and the GenericDatumReader: {}
>>>>
>>>> I get the following error:
>>>> Caused by: org.apache.avro.AvroTypeException: Expected start-union.
>>>>Got
>>>> END_OBJECT
>>>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
>>>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
>>>>     at
>>>>
>>>>org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>>     at
>>>>
>>>>org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206
>>>>)
>>>>     at
>>>>
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:152)
>>>>     at
>>>>
>>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
>>>>r.java:177)
>>>>     at
>>>>
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:148)
>>>>     at
>>>>
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:139)
>>>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
>>>>
>>>> I'm not seeing any immediate issues online around this...is this
>>>> expected? I'm reading it in as such:
>>>>
>>>> Schema avroSchema = new Schema.Parser().parse(schemaLine);
>>>> GenericDatumReader<Object> reader = new
>>>> GenericDatumReader<Object>(avroSchema);
>>>> Object datum = reader.read(null,
>>>> DecoderFactory.get().jsonDecoder(avroSchema, dataLine));
+
Jonathan Coveney 2013-04-11, 22:22