Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> pig native map-reduce and avro input format


Copy link to this message
-
pig native map-reduce and avro input format
Hi everyone,

I need some help to run my map-reduce job with pig.
I wrote a map-reduce job that takes an avro file as input:
              job.setJarByClass(Main.class);
              job.setJobName("MapReduceJob");
              job.setMapperClass(Mapper.class);
              job.setReducerClass(Reducer.class);
              job.setMapOutputKeyClass(IntWritable.class);
              job.setMapOutputValueClass(DocumentRepresentation.class);
              job.setOutputKeyClass(LongWritable.class);
              job.setInputFormatClass(AvroKeyInputFormat.class);
If I run this job using the Hadoop jag command everything works fine and the output of the map reduce job is as expected.
Now I need this map-reduce job to work inside of a pig-script. To test the mr-job I used this test script:

...register statements

A = LOAD 'mr-input' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
B = MAPREDUCE 'mr-job-0.0.1.jar' STORE A INTO 'mr-tmp' USING org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '...')
                LOAD 'mr-result' AS (prefix: chararray, result: chararray)
                `com.mycompany.hadoop.Main mr-tmp mr-result ...more parameters`;

All the mappers fail with the error: LongWritable cannot be cast to AvroMapper.
The mapper definition looks like this:

public class Mapper extends Mapper<AvroWrapper<Record>, NullWritable, IntWritable, DocumentRepresentation> {

Any idea how to fix it?

Jonas
+
Cheolsoo Park 2013-01-31, 18:40
+
Russell Jurney 2013-01-31, 19:05
+
Jonas Hartwig 2013-01-31, 19:25