Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Newb question on imorting JSON and defaults

Copy link to this message
Newb question on imorting JSON and defaults
Gregory 2013-05-22, 21:26


I have a test.json file that looks like this:

{"first":"John", "last":"Doe", "middle":"C"}
{"first":"John", "last":"Doe"}

(Second line does NOT have a "middle" element).

And I have a test.schema file that looks like this:

  "fields": [
     {"name":"first",  "type":"string"},
     {"name":"middle", "type":"string", "default":""},
     {"name":"last",   "type":"string"}

I then try to use fromjson, as follows, and it chokes on the second line:

$ java -jar avro-tools-1.7.4.jar fromjson --schema-file test.schema test.json > test.avro
Exception in thread "main" org.apache.avro.AvroTypeException: Expected field name not found: middle
         at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:477)
         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
         at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
         at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:219)
         at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)
         at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
         at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
         at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:341)
         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
         at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
         at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:105)
         at org.apache.avro.tool.Main.run(Main.java:80)
         at org.apache.avro.tool.Main.main(Main.java:69)
The short story is - I need to convert a bunch of JSON where an element
may not be present sometimes, in which case I'd want it to default to
something sensible, e.g. blank or null.

According to the Schema Resolution "if the reader's record schema has a
field that contains a default value, and writer's schema does not have a
field with the same name, then the reader should use the default value
from its field."

I'm clearly missing something obvious, any help would be appreciated!