Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Picking up default value for a union?


Copy link to this message
-
Re: Picking up default value for a union?
With Avro, it is generally assumed that your reader is working with
the exact same schema as the data was written with. If you want to
change your schema, e.g. add a field to a record, you still need the
exact same schema as was used for writing (the "writer's schema"), but
you can also give the decoder a second schema (the "reader's schema"),
and Avro will map data from the writer's schema into the reader's
schema for you ("schema evolution").

This requirement of having the exact same schema as the writer makes
more sense with Avro's binary encoding, because it allows Avro to omit
the field names, which makes the encoding very compact. The
requirement makes less sense if you're using the JSON encoding, where
field names are inevitably part of the JSON. I think this behaviour is
expected, but I agree that it's a bit surprising, so perhaps it's
worth discussing whether we should change it.

To answer your question, your input data {} looks like it was written
with a writer schema of {"name":"hey", "type":"record", "fields":[]}
so try using that as your writer schema. Then if you specify
{"name":"hey", "type":"record",
"fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
your reader schema, you should find that the resolving decoder fills
in the field "a" with the default null.

Best,
Martin

On 9 April 2013 02:44, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
> Stepping through the code, it looks like the code only uses defaults for
> writing, not for reading. IE at read time it assumes that the defaults were
> already filled in. It seems like if the reader evolved the schema to include
> new fields, it would be desirable for the defaults to get filled in if not
> present? But stepping through, on reading the defaults are completely
> ignored.
>
>
> 2013/4/9 Jonathan Coveney <[EMAIL PROTECTED]>
>>
>> Please note: {"name":"hey", "type":"record",
>> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
>> doesn't work
>>
>>
>> 2013/4/9 Jonathan Coveney <[EMAIL PROTECTED]>
>>>
>>> I have the following schema: {"name":"hey", "type":"record",
>>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
>>>
>>> I am trying to deserialize the following against this schema using Java
>>> and the GenericDatumReader: {}
>>>
>>> I get the following error:
>>> Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
>>> END_OBJECT
>>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
>>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
>>>     at
>>> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>     at
>>> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
>>>
>>> I'm not seeing any immediate issues online around this...is this
>>> expected? I'm reading it in as such:
>>>
>>> Schema avroSchema = new Schema.Parser().parse(schemaLine);
>>> GenericDatumReader<Object> reader = new
>>> GenericDatumReader<Object>(avroSchema);
>>> Object datum = reader.read(null,
>>> DecoderFactory.get().jsonDecoder(avroSchema, dataLine));
>>>
>>> I'm going to see what's up and why it isn't picking up the default, but
>>> imagined you guys might know what's up?
>>>
>>> Thanks,
>>> Jon
>>
>>
>