Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - How to declare an  optional field

François Kawala 2012-06-06, 11:15
François Kawala 2012-06-07, 08:48
Doug Cutting 2012-06-07, 17:03
Copy link to this message
Re: How to declare an  optional field
Doug Cutting 2012-06-06, 17:23
According to the spec, the default value for a union is assumed to have
the type of the first element of the union.


So some valid fields would be:

{"name":"x", "type":["long", "null"], "default": 0}
{"name":"y", "type":["null", "long"], "default": null}

The following are invalid fields, since the type of the default value
does not match that of the first union element.

{"name":"x", "type":["long", "null"], "default": null}
{"name":"y", "type":["null", "long"], "default": 0}

Python may not implement this strictly, but Java does.

This is a common point of confusion.  We should probably document it
better.  I'm not sure whether it's causing the problem you're seeing,
but perhaps it is.



On 06/06/2012 04:15 AM, Fran�ois Kawala wrote:
> Dear all,
> Despite my desperate effort to get a working schema I can not manage to
> specify that a field of a given record can be either : "a given type" or
> "null". I've tried with unions but the back-end that I have to use seems
> to be unhappy with it. More precisely : I'm trying to output the result
> of a Streaming MR job within an AVRO container. This job is written in
> python an executed through dumbo (http://www.dumbotics.com), and a
> custom OutputFormat is used
> (https://github.com/tomslabs/avro-utils/tree/master/src/main/java/com/tomslabs/grid/avro)
> However since this custom OutputFormat relies on org.apache.avro
> sources, I've thought this list could be a good spot to call for help.
> Thanks for reading,
> Fran�ois.
> ------------------------------------------------------------------------
> Here is some complementary elements :
> Fragment of the schema that I think to be responsible of my troubles :
> {"name": "in_reply_to", "type": [{"type": "long"},"null"], "default":"null"}
> I've also unsuccessfully tried :
> {"name": "in_reply_to", "type": [{"type": "long"},"null"]}
> {"name": "in_reply_to", "type": ["null",{"type": "long"}]}
>     Each ending with the same error message :
>         org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
>     Error Stack :
>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:460)
>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:418)
>     at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>     at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>     at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:102)
>     at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:88)
>     at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:446)
>     at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:421)