Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> How to declare an  optional field


Copy link to this message
-
Re: How to declare an  optional field
According to the spec, the default value for a union is assumed to have
the type of the first element of the union.

http://avro.apache.org/docs/current/spec.html#schema_record

So some valid fields would be:

{"name":"x", "type":["long", "null"], "default": 0}
{"name":"y", "type":["null", "long"], "default": null}

The following are invalid fields, since the type of the default value
does not match that of the first union element.

{"name":"x", "type":["long", "null"], "default": null}
{"name":"y", "type":["null", "long"], "default": 0}

Python may not implement this strictly, but Java does.

This is a common point of confusion.  We should probably document it
better.  I'm not sure whether it's causing the problem you're seeing,
but perhaps it is.

Cheers,

Doug

On 06/06/2012 04:15 AM, Fran�ois Kawala wrote:
> Dear all,
>
> Despite my desperate effort to get a working schema I can not manage to
> specify that a field of a given record can be either : "a given type" or
> "null". I've tried with unions but the back-end that I have to use seems
> to be unhappy with it. More precisely : I'm trying to output the result
> of a Streaming MR job within an AVRO container. This job is written in
> python an executed through dumbo (http://www.dumbotics.com), and a
> custom OutputFormat is used
> (https://github.com/tomslabs/avro-utils/tree/master/src/main/java/com/tomslabs/grid/avro)
>
>
> However since this custom OutputFormat relies on org.apache.avro
> sources, I've thought this list could be a good spot to call for help.
>
> Thanks for reading,
> Fran�ois.
>
> ------------------------------------------------------------------------
>
> Here is some complementary elements :
>
> Fragment of the schema that I think to be responsible of my troubles :
>
> {"name": "in_reply_to", "type": [{"type": "long"},"null"], "default":"null"}
>
> I've also unsuccessfully tried :
>
> {"name": "in_reply_to", "type": [{"type": "long"},"null"]}
> {"name": "in_reply_to", "type": ["null",{"type": "long"}]}
>
>     Each ending with the same error message :
>
>         org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
>
>     Error Stack :
>
>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:460)
>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:418)
>     at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>     at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>     at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:102)
>     at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:88)
>     at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:446)
>     at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:421)
>
>
>
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB