Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> NPE with Generic/SpecifcDatumWriter in Avro 1.3.3


+
Lewis John Mcgibbney 2012-12-07, 16:39
Copy link to this message
-
Re: NPE with Generic/SpecifcDatumWriter in Avro 1.3.3
Hi,
Fortunately we discovered the flaw in the avsc.

The schema has been changed and appended below. Note the record fields
protocolStatus and parseStatus are now optionals (all optional). This
solved the issue.

Best

Lewis

{"name": "WebPage",
 "type": "record",
 "namespace": "org.apache.nutch.storage",
 "fields": [
        {"name": "baseUrl", "type": ["null","string"] },
        {"name": "status", "type": "int"},
        {"name": "fetchTime", "type": "long"},
        {"name": "prevFetchTime", "type": "long"},
        {"name": "fetchInterval", "type": "int"},
        {"name": "retriesSinceFetch", "type": "int"},
        {"name": "modifiedTime", "type": "long"},
        {"name": "protocolStatus", "type": ["null", {
            "name": "ProtocolStatus",
            "type": "record",
            "namespace": "org.apache.nutch.storage",
            "fields": [
                {"name": "code", "type": "int"},
                {"name": "args", "type": {"type": "array", "items": "string"}},
                {"name": "lastModified", "type": "long"}
            ]
            }]},
        {"name": "content", "type": ["null","bytes"]},
        {"name": "contentType", "type": ["null","string"] },
        {"name": "prevSignature", "type": ["null","bytes"]},
        {"name": "signature", "type": ["null","bytes"]},
        {"name": "title", "type": ["null","string"] },
        {"name": "text", "type": ["null","string"] },
        {"name": "parseStatus", "type": ["null",{
            "name": "ParseStatus",
            "type": "record",
            "namespace": "org.apache.nutch.storage",
            "fields": [
                {"name": "majorCode", "type": "int"},
                {"name": "minorCode", "type": "int"},
                {"name": "args", "type": {"type": "array", "items": "string"}}
            ]
            }]},
        {"name": "score", "type": "float"},
        {"name": "reprUrl", "type": ["null","string"] },
        {"name": "headers", "type": {"type": "map", "values": "string"}},
        {"name": "outlinks", "type": {"type": "map", "values": "string"}},
        {"name": "inlinks", "type": {"type": "map", "values": "string"}},
        {"name": "markers", "type": {"type": "map", "values": "string"}},
        {"name": "metadata", "type": {"type": "map", "values": "bytes"}}
   ]
}

On Fri, Dec 7, 2012 at 4:39 PM, Lewis John Mcgibbney
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> We have an issue over in Nutch where we are trying to inject urls into
> an Avro backed file store (which resides in Gora [0]). The schema we
> are using to generate the Java classes to store the data can be found
> here [1].
>
> Currently when I use the Nutch Inject tool (a MR job which reads a
> flat file of URLs adding metadata then stores these into the file
> store) I get the following stack trace
>
> java.lang.NullPointerException
>         at org.apache.avro.specific.SpecificDatumWriter.getField(SpecificDatumWriter.java:48)
>         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
>         at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
>         at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
>         at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
>         at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)

Lewis
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB