Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> pig native map-reduce and avro input format


Copy link to this message
-
Re: pig native map-reduce and avro input format
Hi Jonas,

I had to do a similar job before. What I did was store a relation as Avro
files and bulkload them into HBase. You can see my example here:
https://github.com/piaozhexiu/hbase-bulkload-avro

It's hard to tell what went wrong without seeing your code. But your Pig
command seems correct to me.

Thanks,
Cheolsoo
On Thu, Jan 31, 2013 at 12:53 AM, Jonas Hartwig <[EMAIL PROTECTED]>wrote:

> Hi everyone,
>
> I need some help to run my map-reduce job with pig.
> I wrote a map-reduce job that takes an avro file as input:
>               job.setJarByClass(Main.class);
>               job.setJobName("MapReduceJob");
>               job.setMapperClass(Mapper.class);
>               job.setReducerClass(Reducer.class);
>               job.setMapOutputKeyClass(IntWritable.class);
>               job.setMapOutputValueClass(DocumentRepresentation.class);
>               job.setOutputKeyClass(LongWritable.class);
>               job.setInputFormatClass(AvroKeyInputFormat.class);
> If I run this job using the Hadoop jag command everything works fine and
> the output of the map reduce job is as expected.
> Now I need this map-reduce job to work inside of a pig-script. To test the
> mr-job I used this test script:
>
> ...register statements
>
> A = LOAD 'mr-input' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> B = MAPREDUCE 'mr-job-0.0.1.jar' STORE A INTO 'mr-tmp' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '...')
>                 LOAD 'mr-result' AS (prefix: chararray, result: chararray)
>                 `com.mycompany.hadoop.Main mr-tmp mr-result ...more
> parameters`;
>
> All the mappers fail with the error: LongWritable cannot be cast to
> AvroMapper.
> The mapper definition looks like this:
>
> public class Mapper extends Mapper<AvroWrapper<Record>, NullWritable,
> IntWritable, DocumentRepresentation> {
>
> Any idea how to fix it?
>
> Jonas
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB