Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> optional enums


Copy link to this message
-
optional enums
What's the "best" way to represent an optional enum in avro (in terms of
space efficiency, computational efficiency, and readability)?  To be
consistent with other optional fields, I was planning to use union of null
and my enum type.  The other approach I could see was adding a NULL field to
the enum -- but then my code would have to initialize the enum field to null
before a write.

I've tried to use union of null and the enum-type, but I've run into an
issue with this approach when using the AvroOutputFormat.  The following
code summarizes my issue:

  public void testDataWriteWithSchema() throws IOException {
    final DataFileWriter<Event> writer       new DataFileWriter<Event>(new SpecificDatumWriter<Event>());

    writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
    writer.append(getEvent());
    writer.close();
  }

  public void testDataWriteWithSchemaWithClass() throws IOException {
    final DataFileWriter<Event> writer       new DataFileWriter<Event>(new
SpecificDatumWriter<Event>(Event.class));

    writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
    writer.append(getEvent());
    writer.close();
  }
When I don't pass in the Event.class to SpecificDatumWriter (the first test
method), the above test fails with the following exception:

Not in union
["null", {"type":"enum","name":"Suit","namespace":"foo","symbols":["SPADES","CLUBS","HEARS","DIAMONDS"]}]:
SPADES

 at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:382)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:67)

at org.apache.avro.generic.GenericDatumWriter.writeRecord(
GenericDatumWriter.java:100)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:62)

at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:54)

at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
AvroOutputFormat uses the SpecificDatumWriter's default c'tor, so I run into
the above exception when using it.  Is there some way around this (other
than implementing my own OutputFormat that passes along the class?).

Thanks,
Joe
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB