Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - StoreFunc with Sequence file


Copy link to this message
-
Re: StoreFunc with Sequence file
Gianmarco De Francisci Mo... 2011-10-31, 11:28
Thanks Ashutosh,

your suggestion helped.
Actually, I am loading data using PigStorage, so my output <key, value>
pair are declared as <NullableText, NullableTuple>.

By declaring my getOutputFormat() to return
a SequenceFileOutputFormat<NullableText, NullableTuple>() I managed to make
it work.

The downside is that now I need to wrap my bytes in a Tuple and wrap the
Tuple in a NullableTuple.
Is this the intended way it should work?
Why not let the user use any <WritableComparable, Writable> pair instead?
It should be possible for Pig to use the classes defined by the user in the
StoreFunc in order to define the OutputKeyClass and OutputValueClass.

Cheers,
--
Gianmarco
On Fri, Oct 28, 2011 at 19:15, Ashutosh Chauhan <[EMAIL PROTECTED]>wrote:

> Hey Gianmarco,
>
> How are you loading data in pig script? Using your own LoadFunc. Pig
> declares following types to MR framework:
> Map:
>  KeyIn: Text, ValueIn:Tuple
>  Reducer:
>  KeyOut: PigNullableWritable, ValueOut:Writable
>
> So, your loadfunc/storefunc key,value types must extend from these.
>
> Hope it helps,
> Ashutosh
>
> On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
> [EMAIL PROTECTED]> wrote:
>
> > Hi pig users,
> > I implemented a custom StoreFunc to write some data in a binary format
> to a
> > Sequence File.
> >
> >    private RecordWriter<NullWritable, BytesWritable> writer;
> >
> >    private BytesWritable bytes;
> >
> >    private DataOutputBuffer dob;
> >
> >
> >    @SuppressWarnings("rawtypes")
> >
> >    @Override
> >
> >    public OutputFormat getOutputFormat() throws IOException {
> >
> >        return new SequenceFileOutputFormat<NullWritable,
> BytesWritable>();
> >
> >    }
> >
> >
> >    @SuppressWarnings({ "rawtypes", "unchecked" })
> >
> >    @Override
> >
> >    public void prepareToWrite(RecordWriter writer) throws IOException {
> >
> >        this.writer = writer;
> >
> >        this.bytes = new BytesWritable();
> >
> >        this.dob = new DataOutputBuffer();
> >
> >    }
> >
> >    @Override
> >
> >    public void putNext(Tuple tuple) throws IOException {
> >
> >        dob.reset();
> >
> >        WritableUtils.writeCompressedString(dob, (String) tuple.get(0));
> >
> >        DataBag childTracesBag = (DataBag) tuple.get(1);
> >
> >        WritableUtils.writeVLong(dob, childTracesBag.size());
> >
> >        for (Tuple t : childTracesBag) {
> >
> >            WritableUtils.writeVInt(dob, (Integer) t.get(0));
> >
> >            dob.writeLong((Long) t.get(1));
> >
> >        }
> >
> >        try {
> >
> >            bytes.set(dob.getData(), 0, dob.getLength());
> >
> >            writer.write(NullWritable.get(), bytes);
> >
> >        } catch (InterruptedException e) {
> >
> >            e.printStackTrace();
> >
> >        }
> >
> >    }
> >
> >
> > But I get this exception:
> >
> >
> > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > recreate exception from backed error: java.io.IOException:
> > java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable
> is
> > not class org.apache.pig.impl.io.NullableText
> >
> >
> >
> > And if I use a NullableText instead of a NullWritable, I get this other
> > exception:
> >
> >
> > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > recreate exception from backed error: java.io.IOException:
> > java.io.IOException: wrong value class:
> org.apache.hadoop.io.BytesWritable
> > is not class org.apache.pig.impl.io.NullableTuple
> >
> >
> >
> > There must be something I am doing wrong in telling Pig the types of the
> > sequence file.
> >
> > It must be a stupid problem but I don't see it.
> >
> > Does anybody have a clue?
> >
> >
> > Thanks,
> > --
> > Gianmarco
> >
>