Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - StoreFunc with Sequence file


+
Gianmarco De Francisci Mo... 2011-10-28, 16:37
Copy link to this message
-
Re: StoreFunc with Sequence file
Ashutosh Chauhan 2011-10-28, 17:15
Hey Gianmarco,

How are you loading data in pig script? Using your own LoadFunc. Pig
declares following types to MR framework:
Map:
  KeyIn: Text, ValueIn:Tuple
 Reducer:
  KeyOut: PigNullableWritable, ValueOut:Writable

So, your loadfunc/storefunc key,value types must extend from these.

Hope it helps,
Ashutosh

On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
[EMAIL PROTECTED]> wrote:

> Hi pig users,
> I implemented a custom StoreFunc to write some data in a binary format to a
> Sequence File.
>
>    private RecordWriter<NullWritable, BytesWritable> writer;
>
>    private BytesWritable bytes;
>
>    private DataOutputBuffer dob;
>
>
>    @SuppressWarnings("rawtypes")
>
>    @Override
>
>    public OutputFormat getOutputFormat() throws IOException {
>
>        return new SequenceFileOutputFormat<NullWritable, BytesWritable>();
>
>    }
>
>
>    @SuppressWarnings({ "rawtypes", "unchecked" })
>
>    @Override
>
>    public void prepareToWrite(RecordWriter writer) throws IOException {
>
>        this.writer = writer;
>
>        this.bytes = new BytesWritable();
>
>        this.dob = new DataOutputBuffer();
>
>    }
>
>    @Override
>
>    public void putNext(Tuple tuple) throws IOException {
>
>        dob.reset();
>
>        WritableUtils.writeCompressedString(dob, (String) tuple.get(0));
>
>        DataBag childTracesBag = (DataBag) tuple.get(1);
>
>        WritableUtils.writeVLong(dob, childTracesBag.size());
>
>        for (Tuple t : childTracesBag) {
>
>            WritableUtils.writeVInt(dob, (Integer) t.get(0));
>
>            dob.writeLong((Long) t.get(1));
>
>        }
>
>        try {
>
>            bytes.set(dob.getData(), 0, dob.getLength());
>
>            writer.write(NullWritable.get(), bytes);
>
>        } catch (InterruptedException e) {
>
>            e.printStackTrace();
>
>        }
>
>    }
>
>
> But I get this exception:
>
>
> ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> recreate exception from backed error: java.io.IOException:
> java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable is
> not class org.apache.pig.impl.io.NullableText
>
>
>
> And if I use a NullableText instead of a NullWritable, I get this other
> exception:
>
>
> ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> recreate exception from backed error: java.io.IOException:
> java.io.IOException: wrong value class: org.apache.hadoop.io.BytesWritable
> is not class org.apache.pig.impl.io.NullableTuple
>
>
>
> There must be something I am doing wrong in telling Pig the types of the
> sequence file.
>
> It must be a stupid problem but I don't see it.
>
> Does anybody have a clue?
>
>
> Thanks,
> --
> Gianmarco
>
+
Gianmarco De Francisci Mo... 2011-10-31, 11:28
+
Ashutosh Chauhan 2011-11-01, 00:55
+
Gianmarco De Francisci Mo... 2011-11-03, 17:24