|
|
-
Re: How to declare an optional fieldDoug Cutting 2012-06-06, 17:23
According to the spec, the default value for a union is assumed to have
the type of the first element of the union. http://avro.apache.org/docs/current/spec.html#schema_record So some valid fields would be: {"name":"x", "type":["long", "null"], "default": 0} {"name":"y", "type":["null", "long"], "default": null} The following are invalid fields, since the type of the default value does not match that of the first union element. {"name":"x", "type":["long", "null"], "default": null} {"name":"y", "type":["null", "long"], "default": 0} Python may not implement this strictly, but Java does. This is a common point of confusion. We should probably document it better. I'm not sure whether it's causing the problem you're seeing, but perhaps it is. Cheers, Doug On 06/06/2012 04:15 AM, Fran�ois Kawala wrote: > Dear all, > > Despite my desperate effort to get a working schema I can not manage to > specify that a field of a given record can be either : "a given type" or > "null". I've tried with unions but the back-end that I have to use seems > to be unhappy with it. More precisely : I'm trying to output the result > of a Streaming MR job within an AVRO container. This job is written in > python an executed through dumbo (http://www.dumbotics.com), and a > custom OutputFormat is used > (https://github.com/tomslabs/avro-utils/tree/master/src/main/java/com/tomslabs/grid/avro) > > > However since this custom OutputFormat relies on org.apache.avro > sources, I've thought this list could be a good spot to call for help. > > Thanks for reading, > Fran�ois. > > ------------------------------------------------------------------------ > > Here is some complementary elements : > > Fragment of the schema that I think to be responsible of my troubles : > > {"name": "in_reply_to", "type": [{"type": "long"},"null"], "default":"null"} > > I've also unsuccessfully tried : > > {"name": "in_reply_to", "type": [{"type": "long"},"null"]} > {"name": "in_reply_to", "type": ["null",{"type": "long"}]} > > Each ending with the same error message : > > org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT > > Error Stack : > > at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:460) > at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:418) > at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) > at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) > at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) > at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) > at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:102) > at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:88) > at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:446) > at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:421) > > > > > > > |