Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Problem writing LoadFunc - why can't I use a sub-class of FileInputFormat as my InputFormat?


+
Russell Jurney 2012-08-19, 14:20
Copy link to this message
-
Re: Problem writing LoadFunc - why can't I use a sub-class of FileInputFormat as my InputFormat?
Figured this out - I was ping ponging between mapred and mapreduce APIs.

package org.apache.pig.piggybank.storage.arc;

import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.nutch.tools.arc.ArcInputFormat;
import org.apache.nutch.tools.arc.ArcRecordReader;

import java.io.IOException;

public class PigArcInputFormat extends FileInputFormat<Text, BytesWritable>
{

    public PigArcInputFormat() {
    }

    public ArcInputFormat getInputFormat() throws IOException {
        return new ArcInputFormat();
    }

    public RecordReader<Text, BytesWritable> getRecordReader(InputSplit
split, JobConf config, Reporter reporter)
            throws IOException {
        return new ArcRecordReader(config, (FileSplit)split);
    }
}
On Sun, Aug 19, 2012 at 7:20 AM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> I am writing a LoadFunc called ArcFileReader to load Common Crawl data in
> ArcFile format. There is already a ArcRecord, ArcRecordReader and
> ArcInputFormat for Hadoop.
>
> ArcInputFormat extends Hadoop's FileInputFormat, which implements Hadoop's
> InputFormat interface. Why then can't I specify ArcInputFormat as my
> InputFormat in my LoadFunc?
>
>     @Override
>     public InputFormat getInputFormat() throws IOException {
>         return new ArcInputFormat();
>     }
>
>
> Java complains - attempting to use incompatible return type. What gives?
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.
> com
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB