Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> pig native map-reduce and avro input format

Copy link to this message
Re: pig native map-reduce and avro input format
Its a standard avro file with included schema.it has like 15 fields.long,string and null,string unions.


Russell Jurney <[EMAIL PROTECTED]> schrieb:
Can you describe the avro stuff you're loading?

Russell Jurney http://datasyndrome.com

On Jan 31, 2013, at 10:41 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Jonas,
> I had to do a similar job before. What I did was store a relation as Avro
> files and bulkload them into HBase. You can see my example here:
> https://github.com/piaozhexiu/hbase-bulkload-avro
> It's hard to tell what went wrong without seeing your code. But your Pig
> command seems correct to me.
> Thanks,
> Cheolsoo
> On Thu, Jan 31, 2013 at 12:53 AM, Jonas Hartwig <[EMAIL PROTECTED]>wrote:
>> Hi everyone,
>> I need some help to run my map-reduce job with pig.
>> I wrote a map-reduce job that takes an avro file as input:
>>              job.setJarByClass(Main.class);
>>              job.setJobName("MapReduceJob");
>>              job.setMapperClass(Mapper.class);
>>              job.setReducerClass(Reducer.class);
>>              job.setMapOutputKeyClass(IntWritable.class);
>>              job.setMapOutputValueClass(DocumentRepresentation.class);
>>              job.setOutputKeyClass(LongWritable.class);
>>              job.setInputFormatClass(AvroKeyInputFormat.class);
>> If I run this job using the Hadoop jag command everything works fine and
>> the output of the map reduce job is as expected.
>> Now I need this map-reduce job to work inside of a pig-script. To test the
>> mr-job I used this test script:
>> ...register statements
>> A = LOAD 'mr-input' USING
>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>> B = MAPREDUCE 'mr-job-0.0.1.jar' STORE A INTO 'mr-tmp' USING
>> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '...')
>>                LOAD 'mr-result' AS (prefix: chararray, result: chararray)
>>                `com.mycompany.hadoop.Main mr-tmp mr-result ...more
>> parameters`;
>> All the mappers fail with the error: LongWritable cannot be cast to
>> AvroMapper.
>> The mapper definition looks like this:
>> public class Mapper extends Mapper<AvroWrapper<Record>, NullWritable,
>> IntWritable, DocumentRepresentation> {
>> Any idea how to fix it?
>> Jonas