Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> How to read with SpecificDatumReader


+
Alan Miller 2012-12-20, 15:21
+
Doug Cutting 2012-12-20, 17:49
Copy link to this message
-
Re: How to read with SpecificDatumReader
Thanks Doug,
I guess that's the problem (somehow) but I don't see why writing the Avro
file works
but reading it doesn't. I mean, I',m writing or reading the files in the
same way.

When I said non-Hadoop, I mean't the java program isn't (yet) running via
Hadoop (yet).
This code is in my "Driver" class, before I actually submit the job to
Hadoop. Basically,
this is what I do.

I JAR up my classes (MyAppDriver,MyAppMapper,MyAppReducer,MyAppRecord) in
myjar.jar
then run this wrapper to trigger my MR job:

     DRIVER="com.company.app.MyAppDriver"
     JAR="/some/path/my.jar"
     ARGS="-debug -overwrite"

 EXTRAJARS="lib/logback-core-1.0.6.jar:lib/logback-classic-1.0.6.jar:lib/json_simple-1.1.jar"
     export HADOOP_USER_CLASSPATH_FIRST="true"
     export HADOOP_CLASSPATH=${EXTRAJARS}
     hadoop jar ${JAR} ${DRIVER} ${ARGS}

Writing the Avro file in the "Driver" code works but reading does not.
HOWEVER, if I add my.jar to EXTRAJARS then reading the Avro file works.

Alan
What I
On Thu, Dec 20, 2012 at 6:49 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> It looks to me like in your non-Hadoop application
> com.company.app.MyRecord is not on the classpath.
>
> Doug
>
> On Thu, Dec 20, 2012 at 7:21 AM, Alan Miller <[EMAIL PROTECTED]>
> wrote:
> > I can write my Avro data fine, but how do I read my data records with the
> > SpecificDatum reader?
> >
> > Basically, I write my (hdfs) data file like this:
> > Schema schema = new MyRecord().getSchema();
> > DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema);
> > DataFileWriter<MyRecord> dataFileWriter = new
> > DataFileWriter<MyRecord>(writer);
> > FSDataOutputStream fos = fs.create(avroPath);
> > dataFileWriter.create(schema, fos);
> > for (MyRecord r : map.values()) {
> > dataFileWriter.flush();
> > dataFileWriter.append(r);
> > }
> > dataFileWriter.flush();
> >
> > This works fine because my MR job processes the generated files via
> >      Job job = new Job(config, jobName);
> >      job.setJarByClass(getClass());
> >         AvroJob.setInputKeySchema(job, schema);
> >      AvroJob.setInputValueSchema(job, schema);
> >         job.setInputFormatClass(AvroKeyInputFormat.class);
> >         job.setMapperClass(MyMapper.class);
> >
> > Now I need to read the file from a different (non-Hadoop) application but
> > when I try to read the data like this:
> > 596 DatumReader<MyRecord> myDatumReader = new
> > SpecificDatumReader<MyRecord>(MyRecord.class);
> > 597 DataFileReader<MyRecord> dataFileReader = new
> > DataFileReader<MyRecord>(localFile, myDatumReader);
> > 598 MyRecord record = null;
> > 599 String owner = null;
> > 600 while (dataFileReader.hasNext()) {
> > 601 record = dataFileReader.next(record);
> > 602 owner = record.getOwner().toString();
> > 603 System.out.printf("owner = %s\n", owner);
> > 604 }
> > 605 dataFileReader.close();
> >
> > I get this error:
> > Exception in thread "main" java.lang.ClassCastException:
> > org.apache.avro.generic.GenericData$Record cannot be cast to
> > com.company.app.MyRecord
> > at com.company.app.MyDriver.readAvroData(MyDriver.java:601)
> > at com.company.app.MyDriver.main(MyDriver.java:1378)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > Alan
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB