Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Picking up default value for a union?


Copy link to this message
-
Re: Picking up default value for a union?
Thank you both. Makes sense
2013/4/11 Scott Carey <[EMAIL PROTECTED]>

> Minor addition, the default value should be
>
> null
>
> not
>
> "null"
>
> -- the latter is a string, the former is null.
>
> http://avro.apache.org/docs/current/spec.html#schema_record
>
>
> On 4/9/13 8:42 PM, "Martin Kleppmann" <[EMAIL PROTECTED]> wrote:
>
> >With Avro, it is generally assumed that your reader is working with
> >the exact same schema as the data was written with. If you want to
> >change your schema, e.g. add a field to a record, you still need the
> >exact same schema as was used for writing (the "writer's schema"), but
> >you can also give the decoder a second schema (the "reader's schema"),
> >and Avro will map data from the writer's schema into the reader's
> >schema for you ("schema evolution").
> >
> >This requirement of having the exact same schema as the writer makes
> >more sense with Avro's binary encoding, because it allows Avro to omit
> >the field names, which makes the encoding very compact. The
> >requirement makes less sense if you're using the JSON encoding, where
> >field names are inevitably part of the JSON. I think this behaviour is
> >expected, but I agree that it's a bit surprising, so perhaps it's
> >worth discussing whether we should change it.
> >
> >To answer your question, your input data {} looks like it was written
> >with a writer schema of {"name":"hey", "type":"record", "fields":[]}
> >so try using that as your writer schema. Then if you specify
> >{"name":"hey", "type":"record",
> >"fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
> >your reader schema, you should find that the resolving decoder fills
> >in the field "a" with the default null.
> >
> >Best,
> >Martin
> >
> >On 9 April 2013 02:44, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
> >> Stepping through the code, it looks like the code only uses defaults for
> >> writing, not for reading. IE at read time it assumes that the defaults
> >>were
> >> already filled in. It seems like if the reader evolved the schema to
> >>include
> >> new fields, it would be desirable for the defaults to get filled in if
> >>not
> >> present? But stepping through, on reading the defaults are completely
> >> ignored.
> >>
> >>
> >> 2013/4/9 Jonathan Coveney <[EMAIL PROTECTED]>
> >>>
> >>> Please note: {"name":"hey", "type":"record",
> >>> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
> >>> doesn't work
> >>>
> >>>
> >>> 2013/4/9 Jonathan Coveney <[EMAIL PROTECTED]>
> >>>>
> >>>> I have the following schema: {"name":"hey", "type":"record",
> >>>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
> >>>>
> >>>> I am trying to deserialize the following against this schema using
> >>>>Java
> >>>> and the GenericDatumReader: {}
> >>>>
> >>>> I get the following error:
> >>>> Caused by: org.apache.avro.AvroTypeException: Expected start-union.
> >>>>Got
> >>>> END_OBJECT
> >>>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
> >>>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
> >>>>     at
> >>>>
> >>>>org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> >>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >>>>     at
> >>>>
> >>>>org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206
> >>>>)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:152)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
> >>>>r.java:177)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:148)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:139)
> >>>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
> >>>>
> >>>> I'm not seeing any immediate issues online around this...is this
> >>>> expected? I'm reading it in as such:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB