Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Problem writing LoadFunc - why can't I use a sub-class of FileInputFormat as my InputFormat?


+
Russell Jurney 2012-08-19, 14:20
Copy link to this message
-
Re: Problem writing LoadFunc - why can't I use a sub-class of FileInputFormat as my InputFormat?
Russell Jurney 2012-08-19, 19:30
Figured this out - I was ping ponging between mapred and mapreduce APIs.

package org.apache.pig.piggybank.storage.arc;

import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.nutch.tools.arc.ArcInputFormat;
import org.apache.nutch.tools.arc.ArcRecordReader;

import java.io.IOException;

public class PigArcInputFormat extends FileInputFormat<Text, BytesWritable>
{

    public PigArcInputFormat() {
    }

    public ArcInputFormat getInputFormat() throws IOException {
        return new ArcInputFormat();
    }

    public RecordReader<Text, BytesWritable> getRecordReader(InputSplit
split, JobConf config, Reporter reporter)
            throws IOException {
        return new ArcRecordReader(config, (FileSplit)split);
    }
}
On Sun, Aug 19, 2012 at 7:20 AM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> I am writing a LoadFunc called ArcFileReader to load Common Crawl data in
> ArcFile format. There is already a ArcRecord, ArcRecordReader and
> ArcInputFormat for Hadoop.
>
> ArcInputFormat extends Hadoop's FileInputFormat, which implements Hadoop's
> InputFormat interface. Why then can't I specify ArcInputFormat as my
> InputFormat in my LoadFunc?
>
>     @Override
>     public InputFormat getInputFormat() throws IOException {
>         return new ArcInputFormat();
>     }
>
>
> Java complains - attempting to use incompatible return type. What gives?
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.
> com
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com