Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - optional enums


Copy link to this message
-
optional enums
Joe Crobak 2010-12-20, 19:53
What's the "best" way to represent an optional enum in avro (in terms of
space efficiency, computational efficiency, and readability)?  To be
consistent with other optional fields, I was planning to use union of null
and my enum type.  The other approach I could see was adding a NULL field to
the enum -- but then my code would have to initialize the enum field to null
before a write.

I've tried to use union of null and the enum-type, but I've run into an
issue with this approach when using the AvroOutputFormat.  The following
code summarizes my issue:

  public void testDataWriteWithSchema() throws IOException {
    final DataFileWriter<Event> writer       new DataFileWriter<Event>(new SpecificDatumWriter<Event>());

    writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
    writer.append(getEvent());
    writer.close();
  }

  public void testDataWriteWithSchemaWithClass() throws IOException {
    final DataFileWriter<Event> writer       new DataFileWriter<Event>(new
SpecificDatumWriter<Event>(Event.class));

    writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
    writer.append(getEvent());
    writer.close();
  }
When I don't pass in the Event.class to SpecificDatumWriter (the first test
method), the above test fails with the following exception:

Not in union
["null", {"type":"enum","name":"Suit","namespace":"foo","symbols":["SPADES","CLUBS","HEARS","DIAMONDS"]}]:
SPADES

 at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:382)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:67)

at org.apache.avro.generic.GenericDatumWriter.writeRecord(
GenericDatumWriter.java:100)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:62)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:54)

at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
AvroOutputFormat uses the SpecificDatumWriter's default c'tor, so I run into
the above exception when using it.  Is there some way around this (other
than implementing my own OutputFormat that passes along the class?).

Thanks,
Joe