Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> pig native map-reduce and avro input format


Copy link to this message
-
Re: pig native map-reduce and avro input format
Can you describe the avro stuff you're loading?

Russell Jurney http://datasyndrome.com

On Jan 31, 2013, at 10:41 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Jonas,
>
> I had to do a similar job before. What I did was store a relation as Avro
> files and bulkload them into HBase. You can see my example here:
> https://github.com/piaozhexiu/hbase-bulkload-avro
>
> It's hard to tell what went wrong without seeing your code. But your Pig
> command seems correct to me.
>
> Thanks,
> Cheolsoo
>
>
> On Thu, Jan 31, 2013 at 12:53 AM, Jonas Hartwig <[EMAIL PROTECTED]>wrote:
>
>> Hi everyone,
>>
>> I need some help to run my map-reduce job with pig.
>> I wrote a map-reduce job that takes an avro file as input:
>>              job.setJarByClass(Main.class);
>>              job.setJobName("MapReduceJob");
>>              job.setMapperClass(Mapper.class);
>>              job.setReducerClass(Reducer.class);
>>              job.setMapOutputKeyClass(IntWritable.class);
>>              job.setMapOutputValueClass(DocumentRepresentation.class);
>>              job.setOutputKeyClass(LongWritable.class);
>>              job.setInputFormatClass(AvroKeyInputFormat.class);
>> If I run this job using the Hadoop jag command everything works fine and
>> the output of the map reduce job is as expected.
>> Now I need this map-reduce job to work inside of a pig-script. To test the
>> mr-job I used this test script:
>>
>> ...register statements
>>
>> A = LOAD 'mr-input' USING
>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>> B = MAPREDUCE 'mr-job-0.0.1.jar' STORE A INTO 'mr-tmp' USING
>> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '...')
>>                LOAD 'mr-result' AS (prefix: chararray, result: chararray)
>>                `com.mycompany.hadoop.Main mr-tmp mr-result ...more
>> parameters`;
>>
>> All the mappers fail with the error: LongWritable cannot be cast to
>> AvroMapper.
>> The mapper definition looks like this:
>>
>> public class Mapper extends Mapper<AvroWrapper<Record>, NullWritable,
>> IntWritable, DocumentRepresentation> {
>>
>> Any idea how to fix it?
>>
>> Jonas
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB