|
|
-
How to read with SpecificDatumReader
Alan Miller 2012-12-20, 15:21
I can write my Avro data fine, but how do I read my data records with the SpecificDatum reader?
Basically, I write my (hdfs) data file like this: Schema schema = new MyRecord().getSchema(); DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema); DataFileWriter<MyRecord> dataFileWriter = new DataFileWriter<MyRecord>(writer); FSDataOutputStream fos = fs.create(avroPath); dataFileWriter.create(schema, fos); for (MyRecord r : map.values()) { dataFileWriter.flush(); dataFileWriter.append(r); } dataFileWriter.flush();
This works fine because my MR job processes the generated files via Job job = new Job(config, jobName); job.setJarByClass(getClass()); AvroJob.setInputKeySchema(job, schema); AvroJob.setInputValueSchema(job, schema); job.setInputFormatClass(AvroKeyInputFormat.class); job.setMapperClass(MyMapper.class);
Now I need to read the file from a different (non-Hadoop) application but when I try to read the data like this: 596 DatumReader<MyRecord> myDatumReader = new SpecificDatumReader<MyRecord>(MyRecord.class); 597 DataFileReader<MyRecord> dataFileReader = new DataFileReader<MyRecord>(localFile, myDatumReader); 598 MyRecord record = null; 599 String owner = null; 600 while (dataFileReader.hasNext()) { 601 record = dataFileReader.next(record); 602 owner = record.getOwner().toString(); 603 System.out.printf("owner = %s\n", owner); 604 } 605 dataFileReader.close();
I get this error: Exception in thread "main" java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.company.app.MyRecord at com.company.app.MyDriver.readAvroData(MyDriver.java:601) at com.company.app.MyDriver.main(MyDriver.java:1378) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Alan
+
Alan Miller 2012-12-20, 15:21
-
Re: How to read with SpecificDatumReader
Doug Cutting 2012-12-20, 17:49
It looks to me like in your non-Hadoop application com.company.app.MyRecord is not on the classpath.
Doug
On Thu, Dec 20, 2012 at 7:21 AM, Alan Miller <[EMAIL PROTECTED]> wrote: > I can write my Avro data fine, but how do I read my data records with the > SpecificDatum reader? > > Basically, I write my (hdfs) data file like this: > Schema schema = new MyRecord().getSchema(); > DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema); > DataFileWriter<MyRecord> dataFileWriter = new > DataFileWriter<MyRecord>(writer); > FSDataOutputStream fos = fs.create(avroPath); > dataFileWriter.create(schema, fos); > for (MyRecord r : map.values()) { > dataFileWriter.flush(); > dataFileWriter.append(r); > } > dataFileWriter.flush(); > > This works fine because my MR job processes the generated files via > Job job = new Job(config, jobName); > job.setJarByClass(getClass()); > AvroJob.setInputKeySchema(job, schema); > AvroJob.setInputValueSchema(job, schema); > job.setInputFormatClass(AvroKeyInputFormat.class); > job.setMapperClass(MyMapper.class); > > Now I need to read the file from a different (non-Hadoop) application but > when I try to read the data like this: > 596 DatumReader<MyRecord> myDatumReader = new > SpecificDatumReader<MyRecord>(MyRecord.class); > 597 DataFileReader<MyRecord> dataFileReader = new > DataFileReader<MyRecord>(localFile, myDatumReader); > 598 MyRecord record = null; > 599 String owner = null; > 600 while (dataFileReader.hasNext()) { > 601 record = dataFileReader.next(record); > 602 owner = record.getOwner().toString(); > 603 System.out.printf("owner = %s\n", owner); > 604 } > 605 dataFileReader.close(); > > I get this error: > Exception in thread "main" java.lang.ClassCastException: > org.apache.avro.generic.GenericData$Record cannot be cast to > com.company.app.MyRecord > at com.company.app.MyDriver.readAvroData(MyDriver.java:601) > at com.company.app.MyDriver.main(MyDriver.java:1378) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > Alan
+
Doug Cutting 2012-12-20, 17:49
-
Re: How to read with SpecificDatumReader
Alan Miller 2012-12-21, 13:54
Thanks Doug, I guess that's the problem (somehow) but I don't see why writing the Avro file works but reading it doesn't. I mean, I',m writing or reading the files in the same way.
When I said non-Hadoop, I mean't the java program isn't (yet) running via Hadoop (yet). This code is in my "Driver" class, before I actually submit the job to Hadoop. Basically, this is what I do.
I JAR up my classes (MyAppDriver,MyAppMapper,MyAppReducer,MyAppRecord) in myjar.jar then run this wrapper to trigger my MR job:
DRIVER="com.company.app.MyAppDriver" JAR="/some/path/my.jar" ARGS="-debug -overwrite"
EXTRAJARS="lib/logback-core-1.0.6.jar:lib/logback-classic-1.0.6.jar:lib/json_simple-1.1.jar" export HADOOP_USER_CLASSPATH_FIRST="true" export HADOOP_CLASSPATH=${EXTRAJARS} hadoop jar ${JAR} ${DRIVER} ${ARGS}
Writing the Avro file in the "Driver" code works but reading does not. HOWEVER, if I add my.jar to EXTRAJARS then reading the Avro file works.
Alan What I On Thu, Dec 20, 2012 at 6:49 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> It looks to me like in your non-Hadoop application > com.company.app.MyRecord is not on the classpath. > > Doug > > On Thu, Dec 20, 2012 at 7:21 AM, Alan Miller <[EMAIL PROTECTED]> > wrote: > > I can write my Avro data fine, but how do I read my data records with the > > SpecificDatum reader? > > > > Basically, I write my (hdfs) data file like this: > > Schema schema = new MyRecord().getSchema(); > > DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema); > > DataFileWriter<MyRecord> dataFileWriter = new > > DataFileWriter<MyRecord>(writer); > > FSDataOutputStream fos = fs.create(avroPath); > > dataFileWriter.create(schema, fos); > > for (MyRecord r : map.values()) { > > dataFileWriter.flush(); > > dataFileWriter.append(r); > > } > > dataFileWriter.flush(); > > > > This works fine because my MR job processes the generated files via > > Job job = new Job(config, jobName); > > job.setJarByClass(getClass()); > > AvroJob.setInputKeySchema(job, schema); > > AvroJob.setInputValueSchema(job, schema); > > job.setInputFormatClass(AvroKeyInputFormat.class); > > job.setMapperClass(MyMapper.class); > > > > Now I need to read the file from a different (non-Hadoop) application but > > when I try to read the data like this: > > 596 DatumReader<MyRecord> myDatumReader = new > > SpecificDatumReader<MyRecord>(MyRecord.class); > > 597 DataFileReader<MyRecord> dataFileReader = new > > DataFileReader<MyRecord>(localFile, myDatumReader); > > 598 MyRecord record = null; > > 599 String owner = null; > > 600 while (dataFileReader.hasNext()) { > > 601 record = dataFileReader.next(record); > > 602 owner = record.getOwner().toString(); > > 603 System.out.printf("owner = %s\n", owner); > > 604 } > > 605 dataFileReader.close(); > > > > I get this error: > > Exception in thread "main" java.lang.ClassCastException: > > org.apache.avro.generic.GenericData$Record cannot be cast to > > com.company.app.MyRecord > > at com.company.app.MyDriver.readAvroData(MyDriver.java:601) > > at com.company.app.MyDriver.main(MyDriver.java:1378) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > > Alan >
+
Alan Miller 2012-12-21, 13:54
|
|