Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Hadoop 0.23, Avro Specific 1.6.3 and "org.apache.avro.generic.GenericData$Record cannot be cast to "

Jacob Metcalf 2012-05-13, 11:48
Russell Jurney 2012-05-13, 12:12
Ken Krugler 2012-05-13, 18:18
Jacob Metcalf 2012-05-13, 21:03
Ken Krugler 2012-05-13, 23:29
Copy link to this message
RE: Hadoop 0.23, Avro Specific 1.6.3 and "org.apache.avro.generic.GenericData$Record cannot be cast to "

Thanks for the suggestions here. I have finally got things working on Hadoop 0.20.2 + Avro 1.7 and got to the bottom of what was wrong with Hadoop 0.23.
The root of the issue is that Avro's SpecificData by default uses the classloader it was loaded with to try and create the classes you are deserializing.
If you look at line 51 of: http://svn.apache.org/viewvc/avro/trunk/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java?view=markup you will see
  protected SpecificData() { this(SpecificData.class.getClassLoader()); }

Since Hadoop 0.23 ships with and requires Avro, SpecificData was getting loaded by the Parent ClassLoader. Debugging this did not have my Job Jar on it - whereas the child classloader which loaded my Reducer does. Thus when Avro tried to create an instance of my class in order to deserialize into, it could not find it. Not being able to load the class Avro defaults to generic mode thus I was getting this fairly obscure message.
I think it should be fairly easy to fix so am considering raising a JIRA for it - thoughts below. But to get myself going switched back to Hadoop 0.20.2 which does not appear to ship with Avro and as you said is very easy to run up on Cygwin.
Many thanks
In terms of fixing this this part of the work was tackled in:
However I am using the MR2 Serializers (formerly odiago-avro) being integrated into Avro 1.7 so do not construct my own SpecificDatumReader. So I had a go at patch AvroDeserializer to find the appropriate classloader and construct a SpecificData with it. However you then fall foul of line 277 of http://svn.apache.org/viewvc/avro/trunk/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java?view=markup:
public Object newRecord(Object old, Schema schema) {   Class c = SpecificData.get().getClass(schema);
Which oddly seems to use the singleton rather than the SpecificData you have so carefully constructed.

Subject: Re: Hadoop 0.23, Avro Specific 1.6.3 and "org.apache.avro.generic.GenericData$Record cannot be cast to "
Date: Sun, 13 May 2012 16:29:07 -0700

Hi Jacob,
On May 13, 2012, at 2:03pm, Jacob Metcalf wrote:Ken, thanks for getting back to me.
1) The Avro specific classes are generated and packed in the same JAR as the mapper and reducer. Attached is my examplehttp://markmail.org/download.xqy?id=m6te4atgmyrrqyv5&number=1 which in parallel I am also getting working on MRUnit so am discussing on that forum. If you want to build it you will need to build odagio-avro.
I agree and cannot comprehend how if the mapper can serialize, the reducer cannot deserialize. My only guess is that the reducer is running in a separate JVM and it is only this which has classpath issues. Logically the mapper output would be deserialized before my reducer is instantiated. I noticed that the JAR does get exploded so my only thought is that there is something going wrong in the Cygwin/Hadoop layer at reduction.
2) Yes the latest version of avro is in my Job Jar. However I am again not sure how to manipulate the Hadoop classpath to ensure it is first. This is possibly more a topic for the Hadoop list.
Two comments…
1. Your pom.xml doesn't look like it's set up to build a proper Hadoop job jar.
After running "mvn assembly:assembly" you should have a job jar that has a lib subdirectory, and inside of that sub-dir you'll have all fo the jars (NOT the classes) for your dependent jars such as avro.
See http://exported.wordpress.com/2010/01/30/building-hadoop-job-jar-with-maven/
After running mvn assembly:assembly in your example directory I get a target/hadoop-example.jar file that's got Hadoop classes (and a bunch of others) all jammed inside it.
And your job jar shouldn't have Hadoop classes or jars inside it - those should be provided.
2. I would suggest using Hadoop 0.20.2 if you're on Cygwin.
That version avoids issues with Hadoop not being able to set permissions on local file system directories.
Subject: Re: Hadoop 0.23, Avro Specific 1.6.3 and "org.apache.avro.generic.GenericData$Record cannot be cast to "
Date: Sun, 13 May 2012 11:18:13 -0700

Hi Jacob,
On May 13, 2012, at 4:48am, Jacob Metcalf wrote:I have just spent several frustrating hours on getting an example MR job using Avro working with Hadoop and after finally getting it working I thought I would share my findings with everyone.
I wrote an example job trying to use Avro MR 1.6.3 to serialize between Map and Reduce then attempted to deploy and run. I am setting up a development cluster with Hadoop 0.23 running pseudo-distributed under cygwin. I ran my job and it failed with:
"org.apache.avro.generic.GenericData$Record cannot be cast to net.jacobmetcalf.avro.Room"
Where Room is an Avro generated class. I found two problems. The first I have partly solved, the second one is more to do with Hadoop and is as yet unsolved:
1) Why when I am using Avro Specific does it end up going Generic?
When deserializing SpecificDatumReader.java attempts to instantiate your target class through reflection. If it fails to create your class it defaults to a GenericData.Record. This Doug has explained here: http://mail-archives.apache.org/mod_mbox/avro-user/201101.mbox/%[EMAIL PROTECTED]%3E But why it is doing it was a little harder to work out. Debugging I saw the SpecificDatumReader could not find my class in its classpath. However in my Job Runner I had done:
job.setJarByClass(HouseAssemblyJob.class); // This should ensure the JAR is distributed around the cluster
I expected with this Hadoop would distribute my Jar around the cluster. It may be doing the distribution but it definitely did not add it to the Reducers classpath. So to get round this I have now set HADOOP_CLASSPATH to the directory I am running from. This is not going to work in a real cluster where the Job Runner