Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> pig native map-reduce and avro input format


Copy link to this message
-
pig native map-reduce and avro input format
Hi everyone,

I need some help to run my map-reduce job with pig.
I wrote a map-reduce job that takes an avro file as input:
              job.setJarByClass(Main.class);
              job.setJobName("MapReduceJob");
              job.setMapperClass(Mapper.class);
              job.setReducerClass(Reducer.class);
              job.setMapOutputKeyClass(IntWritable.class);
              job.setMapOutputValueClass(DocumentRepresentation.class);
              job.setOutputKeyClass(LongWritable.class);
              job.setInputFormatClass(AvroKeyInputFormat.class);
If I run this job using the Hadoop jag command everything works fine and the output of the map reduce job is as expected.
Now I need this map-reduce job to work inside of a pig-script. To test the mr-job I used this test script:

...register statements

A = LOAD 'mr-input' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
B = MAPREDUCE 'mr-job-0.0.1.jar' STORE A INTO 'mr-tmp' USING org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '...')
                LOAD 'mr-result' AS (prefix: chararray, result: chararray)
                `com.mycompany.hadoop.Main mr-tmp mr-result ...more parameters`;

All the mappers fail with the error: LongWritable cannot be cast to AvroMapper.
The mapper definition looks like this:

public class Mapper extends Mapper<AvroWrapper<Record>, NullWritable, IntWritable, DocumentRepresentation> {

Any idea how to fix it?

Jonas
+
Cheolsoo Park 2013-01-31, 18:40
+
Russell Jurney 2013-01-31, 19:05
+
Jonas Hartwig 2013-01-31, 19:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB