Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Picking up default value for a union?


Copy link to this message
-
Re: Picking up default value for a union?
Thank you both. Makes sense
2013/4/11 Scott Carey <[EMAIL PROTECTED]>

> Minor addition, the default value should be
>
> null
>
> not
>
> "null"
>
> -- the latter is a string, the former is null.
>
> http://avro.apache.org/docs/current/spec.html#schema_record
>
>
> On 4/9/13 8:42 PM, "Martin Kleppmann" <[EMAIL PROTECTED]> wrote:
>
> >With Avro, it is generally assumed that your reader is working with
> >the exact same schema as the data was written with. If you want to
> >change your schema, e.g. add a field to a record, you still need the
> >exact same schema as was used for writing (the "writer's schema"), but
> >you can also give the decoder a second schema (the "reader's schema"),
> >and Avro will map data from the writer's schema into the reader's
> >schema for you ("schema evolution").
> >
> >This requirement of having the exact same schema as the writer makes
> >more sense with Avro's binary encoding, because it allows Avro to omit
> >the field names, which makes the encoding very compact. The
> >requirement makes less sense if you're using the JSON encoding, where
> >field names are inevitably part of the JSON. I think this behaviour is
> >expected, but I agree that it's a bit surprising, so perhaps it's
> >worth discussing whether we should change it.
> >
> >To answer your question, your input data {} looks like it was written
> >with a writer schema of {"name":"hey", "type":"record", "fields":[]}
> >so try using that as your writer schema. Then if you specify
> >{"name":"hey", "type":"record",
> >"fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
> >your reader schema, you should find that the resolving decoder fills
> >in the field "a" with the default null.
> >
> >Best,
> >Martin
> >
> >On 9 April 2013 02:44, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
> >> Stepping through the code, it looks like the code only uses defaults for
> >> writing, not for reading. IE at read time it assumes that the defaults
> >>were
> >> already filled in. It seems like if the reader evolved the schema to
> >>include
> >> new fields, it would be desirable for the defaults to get filled in if
> >>not
> >> present? But stepping through, on reading the defaults are completely
> >> ignored.
> >>
> >>
> >> 2013/4/9 Jonathan Coveney <[EMAIL PROTECTED]>
> >>>
> >>> Please note: {"name":"hey", "type":"record",
> >>> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
> >>> doesn't work
> >>>
> >>>
> >>> 2013/4/9 Jonathan Coveney <[EMAIL PROTECTED]>
> >>>>
> >>>> I have the following schema: {"name":"hey", "type":"record",
> >>>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
> >>>>
> >>>> I am trying to deserialize the following against this schema using
> >>>>Java
> >>>> and the GenericDatumReader: {}
> >>>>
> >>>> I get the following error:
> >>>> Caused by: org.apache.avro.AvroTypeException: Expected start-union.
> >>>>Got
> >>>> END_OBJECT
> >>>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
> >>>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
> >>>>     at
> >>>>
> >>>>org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> >>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >>>>     at
> >>>>
> >>>>org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206
> >>>>)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:152)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
> >>>>r.java:177)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:148)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:139)
> >>>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
> >>>>
> >>>> I'm not seeing any immediate issues online around this...is this
> >>>> expected? I'm reading it in as such: