Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> How to read with SpecificDatumReader


Copy link to this message
-
How to read with SpecificDatumReader
I can write my Avro data fine, but how do I read my data records with the
SpecificDatum reader?

Basically, I write my (hdfs) data file like this:
Schema schema = new MyRecord().getSchema();
DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema);
DataFileWriter<MyRecord> dataFileWriter = new
DataFileWriter<MyRecord>(writer);
FSDataOutputStream fos = fs.create(avroPath);
dataFileWriter.create(schema, fos);
for (MyRecord r : map.values()) {
dataFileWriter.flush();
dataFileWriter.append(r);
}
dataFileWriter.flush();

This works fine because my MR job processes the generated files via
     Job job = new Job(config, jobName);
     job.setJarByClass(getClass());
        AvroJob.setInputKeySchema(job, schema);
     AvroJob.setInputValueSchema(job, schema);
        job.setInputFormatClass(AvroKeyInputFormat.class);
        job.setMapperClass(MyMapper.class);

Now I need to read the file from a different (non-Hadoop) application but
when I try to read the data like this:
596 DatumReader<MyRecord> myDatumReader = new
SpecificDatumReader<MyRecord>(MyRecord.class);
597 DataFileReader<MyRecord> dataFileReader = new
DataFileReader<MyRecord>(localFile, myDatumReader);
598 MyRecord record = null;
599 String owner = null;
600 while (dataFileReader.hasNext()) {
601 record = dataFileReader.next(record);
602 owner = record.getOwner().toString();
603 System.out.printf("owner = %s\n", owner);
604 }
605 dataFileReader.close();

I get this error:
Exception in thread "main" java.lang.ClassCastException:
org.apache.avro.generic.GenericData$Record cannot be cast to
com.company.app.MyRecord
at com.company.app.MyDriver.readAvroData(MyDriver.java:601)
at com.company.app.MyDriver.main(MyDriver.java:1378)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Alan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB