Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Jackson and Avro, nested schema


Copy link to this message
-
Jackson and Avro, nested schema
David Arthur 2013-05-08, 18:49
I'm attempting to use Jackson and Avro together to map JSON documents to
a generated Avro class. I have looked at the Json schema included with
Avro, but this requires a top-level "value" element which I don't want.
Essentially, I have JSON documents that have a few typed top level
fields, and one field called "fields" which is more or less arbitrary JSON.

I've reduced this down to strings and ints for simplicity

My first attempt was:

  {
     "type": "record",
     "name": "Json",
     "fields": [
       {
         "name": "value",
         "type": [ "string", "int", {"type": "map", "values": "Json"} ]
       }
     ]
   },

   {
     "name": "Document",
     "type": "record",
     "fields": [
       {
         "name": "id",
         "type": "string"
       },
       {
         "name": "fields",
         "type": {"type": "map", "values": ["string", "int", {"type":
"map", "values": "Json"}]}
       }
     ]
   }

Given a JSON document like:

{
   "id": "doc1",
   "fields": {
     "foo": "bar",
     "spam": "eggs",
     "answer": 42,
     "x": {"a": 1}
   }
}

this seems to work, but it doesn't. When I turn around and try to
serialize this object with Avro, I get the following exception:

java.lang.ClassCastException: java.lang.Integer cannot be cast to
org.apache.avro.generic.IndexedRecord
     at org.apache.avro.generic.GenericData.getField(GenericData.java:526)
     at org.apache.avro.generic.GenericData.getField(GenericData.java:541)
     at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
     at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
     at
org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.java:173)
     at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:69)
     at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
     at
org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.java:173)
     at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:69)
     at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:106)
     at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
     at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)

My best guess is that since the "fields" field is a union, the
representation of it in the generate class is an Object which Jackson
happily throws whatever into.

If I change my schema to explicitly use "int" instead of the "Json"
type, it works fine for my test document

         "type": {"type": "map", "values": ["string", "int", {"type":
"map", "values": "int"}]}

However now I need to enumerate the types for each level of nesting I
want. This is not ideal, and limits me to a fixed level of nesting

To be clear, my issue is not modelling my schema in Avro, but rather
getting Jackson to map JSON onto the generated classes without too much
pain. I have also tried
https://github.com/FasterXML/jackson-dataformat-avro without much luck.

Any help is appreciated

-David