Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - optional enums

Copy link to this message
optional enums
Joe Crobak 2010-12-20, 19:53
What's the "best" way to represent an optional enum in avro (in terms of
space efficiency, computational efficiency, and readability)?  To be
consistent with other optional fields, I was planning to use union of null
and my enum type.  The other approach I could see was adding a NULL field to
the enum -- but then my code would have to initialize the enum field to null
before a write.

I've tried to use union of null and the enum-type, but I've run into an
issue with this approach when using the AvroOutputFormat.  The following
code summarizes my issue:

  public void testDataWriteWithSchema() throws IOException {
    final DataFileWriter<Event> writer       new DataFileWriter<Event>(new SpecificDatumWriter<Event>());

    writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));

  public void testDataWriteWithSchemaWithClass() throws IOException {
    final DataFileWriter<Event> writer       new DataFileWriter<Event>(new

    writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
When I don't pass in the Event.class to SpecificDatumWriter (the first test
method), the above test fails with the following exception:

Not in union
["null", {"type":"enum","name":"Suit","namespace":"foo","symbols":["SPADES","CLUBS","HEARS","DIAMONDS"]}]:

 at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:382)

at org.apache.avro.generic.GenericDatumWriter.write(

at org.apache.avro.generic.GenericDatumWriter.writeRecord(

at org.apache.avro.generic.GenericDatumWriter.write(

at org.apache.avro.generic.GenericDatumWriter.write(

at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
AvroOutputFormat uses the SpecificDatumWriter's default c'tor, so I run into
the above exception when using it.  Is there some way around this (other
than implementing my own OutputFormat that passes along the class?).