|
|
-
pig native map-reduce and avro input formatJonas Hartwig 2013-01-31, 08:53
Hi everyone,
I need some help to run my map-reduce job with pig. I wrote a map-reduce job that takes an avro file as input: job.setJarByClass(Main.class); job.setJobName("MapReduceJob"); job.setMapperClass(Mapper.class); job.setReducerClass(Reducer.class); job.setMapOutputKeyClass(IntWritable.class); job.setMapOutputValueClass(DocumentRepresentation.class); job.setOutputKeyClass(LongWritable.class); job.setInputFormatClass(AvroKeyInputFormat.class); If I run this job using the Hadoop jag command everything works fine and the output of the map reduce job is as expected. Now I need this map-reduce job to work inside of a pig-script. To test the mr-job I used this test script: ...register statements A = LOAD 'mr-input' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); B = MAPREDUCE 'mr-job-0.0.1.jar' STORE A INTO 'mr-tmp' USING org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '...') LOAD 'mr-result' AS (prefix: chararray, result: chararray) `com.mycompany.hadoop.Main mr-tmp mr-result ...more parameters`; All the mappers fail with the error: LongWritable cannot be cast to AvroMapper. The mapper definition looks like this: public class Mapper extends Mapper<AvroWrapper<Record>, NullWritable, IntWritable, DocumentRepresentation> { Any idea how to fix it? Jonas |