Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - How to read with SpecificDatumReader


+
Alan Miller 2012-12-20, 15:21
+
Doug Cutting 2012-12-20, 17:49
Copy link to this message
-
Re: How to read with SpecificDatumReader
Alan Miller 2012-12-21, 13:54
Thanks Doug,
I guess that's the problem (somehow) but I don't see why writing the Avro
file works
but reading it doesn't. I mean, I',m writing or reading the files in the
same way.

When I said non-Hadoop, I mean't the java program isn't (yet) running via
Hadoop (yet).
This code is in my "Driver" class, before I actually submit the job to
Hadoop. Basically,
this is what I do.

I JAR up my classes (MyAppDriver,MyAppMapper,MyAppReducer,MyAppRecord) in
myjar.jar
then run this wrapper to trigger my MR job:

     DRIVER="com.company.app.MyAppDriver"
     JAR="/some/path/my.jar"
     ARGS="-debug -overwrite"

 EXTRAJARS="lib/logback-core-1.0.6.jar:lib/logback-classic-1.0.6.jar:lib/json_simple-1.1.jar"
     export HADOOP_USER_CLASSPATH_FIRST="true"
     export HADOOP_CLASSPATH=${EXTRAJARS}
     hadoop jar ${JAR} ${DRIVER} ${ARGS}

Writing the Avro file in the "Driver" code works but reading does not.
HOWEVER, if I add my.jar to EXTRAJARS then reading the Avro file works.

Alan
What I
On Thu, Dec 20, 2012 at 6:49 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> It looks to me like in your non-Hadoop application
> com.company.app.MyRecord is not on the classpath.
>
> Doug
>
> On Thu, Dec 20, 2012 at 7:21 AM, Alan Miller <[EMAIL PROTECTED]>
> wrote:
> > I can write my Avro data fine, but how do I read my data records with the
> > SpecificDatum reader?
> >
> > Basically, I write my (hdfs) data file like this:
> > Schema schema = new MyRecord().getSchema();
> > DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema);
> > DataFileWriter<MyRecord> dataFileWriter = new
> > DataFileWriter<MyRecord>(writer);
> > FSDataOutputStream fos = fs.create(avroPath);
> > dataFileWriter.create(schema, fos);
> > for (MyRecord r : map.values()) {
> > dataFileWriter.flush();
> > dataFileWriter.append(r);
> > }
> > dataFileWriter.flush();
> >
> > This works fine because my MR job processes the generated files via
> >      Job job = new Job(config, jobName);
> >      job.setJarByClass(getClass());
> >         AvroJob.setInputKeySchema(job, schema);
> >      AvroJob.setInputValueSchema(job, schema);
> >         job.setInputFormatClass(AvroKeyInputFormat.class);
> >         job.setMapperClass(MyMapper.class);
> >
> > Now I need to read the file from a different (non-Hadoop) application but
> > when I try to read the data like this:
> > 596 DatumReader<MyRecord> myDatumReader = new
> > SpecificDatumReader<MyRecord>(MyRecord.class);
> > 597 DataFileReader<MyRecord> dataFileReader = new
> > DataFileReader<MyRecord>(localFile, myDatumReader);
> > 598 MyRecord record = null;
> > 599 String owner = null;
> > 600 while (dataFileReader.hasNext()) {
> > 601 record = dataFileReader.next(record);
> > 602 owner = record.getOwner().toString();
> > 603 System.out.printf("owner = %s\n", owner);
> > 604 }
> > 605 dataFileReader.close();
> >
> > I get this error:
> > Exception in thread "main" java.lang.ClassCastException:
> > org.apache.avro.generic.GenericData$Record cannot be cast to
> > com.company.app.MyRecord
> > at com.company.app.MyDriver.readAvroData(MyDriver.java:601)
> > at com.company.app.MyDriver.main(MyDriver.java:1378)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > Alan
>