Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - StoreFunc with Sequence file


+
Gianmarco De Francisci Mo... 2011-10-28, 16:37
+
Ashutosh Chauhan 2011-10-28, 17:15
+
Gianmarco De Francisci Mo... 2011-10-31, 11:28
Copy link to this message
-
Re: StoreFunc with Sequence file
Ashutosh Chauhan 2011-11-01, 00:55
Actually what I said was not entirely correct. Per Daniel, Pig's load/store
func are designed to work with InputFormat/OutputFormat which works on
<ComparableWritable,Writable> so what you are seeing is not expected. Can
you paste the pig script you are using and the detailed stack trace. You
can find that in JobTracker log.

Hope it helps,
Ashutosh

On Mon, Oct 31, 2011 at 04:28, Gianmarco De Francisci Morales <
[EMAIL PROTECTED]> wrote:

> Thanks Ashutosh,
>
> your suggestion helped.
> Actually, I am loading data using PigStorage, so my output <key, value>
> pair are declared as <NullableText, NullableTuple>.
>
> By declaring my getOutputFormat() to return
> a SequenceFileOutputFormat<NullableText, NullableTuple>() I managed to make
> it work.
>
> The downside is that now I need to wrap my bytes in a Tuple and wrap the
> Tuple in a NullableTuple.
> Is this the intended way it should work?
> Why not let the user use any <WritableComparable, Writable> pair instead?
> It should be possible for Pig to use the classes defined by the user in the
> StoreFunc in order to define the OutputKeyClass and OutputValueClass.
>
> Cheers,
> --
> Gianmarco
>
>
> On Fri, Oct 28, 2011 at 19:15, Ashutosh Chauhan <[EMAIL PROTECTED]
> >wrote:
>
> > Hey Gianmarco,
> >
> > How are you loading data in pig script? Using your own LoadFunc. Pig
> > declares following types to MR framework:
> > Map:
> >  KeyIn: Text, ValueIn:Tuple
> >  Reducer:
> >  KeyOut: PigNullableWritable, ValueOut:Writable
> >
> > So, your loadfunc/storefunc key,value types must extend from these.
> >
> > Hope it helps,
> > Ashutosh
> >
> > On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi pig users,
> > > I implemented a custom StoreFunc to write some data in a binary format
> > to a
> > > Sequence File.
> > >
> > >    private RecordWriter<NullWritable, BytesWritable> writer;
> > >
> > >    private BytesWritable bytes;
> > >
> > >    private DataOutputBuffer dob;
> > >
> > >
> > >    @SuppressWarnings("rawtypes")
> > >
> > >    @Override
> > >
> > >    public OutputFormat getOutputFormat() throws IOException {
> > >
> > >        return new SequenceFileOutputFormat<NullWritable,
> > BytesWritable>();
> > >
> > >    }
> > >
> > >
> > >    @SuppressWarnings({ "rawtypes", "unchecked" })
> > >
> > >    @Override
> > >
> > >    public void prepareToWrite(RecordWriter writer) throws IOException {
> > >
> > >        this.writer = writer;
> > >
> > >        this.bytes = new BytesWritable();
> > >
> > >        this.dob = new DataOutputBuffer();
> > >
> > >    }
> > >
> > >    @Override
> > >
> > >    public void putNext(Tuple tuple) throws IOException {
> > >
> > >        dob.reset();
> > >
> > >        WritableUtils.writeCompressedString(dob, (String) tuple.get(0));
> > >
> > >        DataBag childTracesBag = (DataBag) tuple.get(1);
> > >
> > >        WritableUtils.writeVLong(dob, childTracesBag.size());
> > >
> > >        for (Tuple t : childTracesBag) {
> > >
> > >            WritableUtils.writeVInt(dob, (Integer) t.get(0));
> > >
> > >            dob.writeLong((Long) t.get(1));
> > >
> > >        }
> > >
> > >        try {
> > >
> > >            bytes.set(dob.getData(), 0, dob.getLength());
> > >
> > >            writer.write(NullWritable.get(), bytes);
> > >
> > >        } catch (InterruptedException e) {
> > >
> > >            e.printStackTrace();
> > >
> > >        }
> > >
> > >    }
> > >
> > >
> > > But I get this exception:
> > >
> > >
> > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > recreate exception from backed error: java.io.IOException:
> > > java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable
> > is
> > > not class org.apache.pig.impl.io.NullableText
> > >
> > >
> > >
> > > And if I use a NullableText instead of a NullWritable, I get this other
> > > exception:
> > >
> > >
> > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > recreate exception from backed error: java.io.IOException:
+
Gianmarco De Francisci Mo... 2011-11-03, 17:24