Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> StoreFunc with Sequence file


Copy link to this message
-
StoreFunc with Sequence file
Hi pig users,
I implemented a custom StoreFunc to write some data in a binary format to a
Sequence File.

    private RecordWriter<NullWritable, BytesWritable> writer;

    private BytesWritable bytes;

    private DataOutputBuffer dob;
    @SuppressWarnings("rawtypes")

    @Override

    public OutputFormat getOutputFormat() throws IOException {

        return new SequenceFileOutputFormat<NullWritable, BytesWritable>();

    }
    @SuppressWarnings({ "rawtypes", "unchecked" })

    @Override

    public void prepareToWrite(RecordWriter writer) throws IOException {

        this.writer = writer;

        this.bytes = new BytesWritable();

        this.dob = new DataOutputBuffer();

    }

    @Override

    public void putNext(Tuple tuple) throws IOException {

        dob.reset();

        WritableUtils.writeCompressedString(dob, (String) tuple.get(0));

        DataBag childTracesBag = (DataBag) tuple.get(1);

        WritableUtils.writeVLong(dob, childTracesBag.size());

        for (Tuple t : childTracesBag) {

            WritableUtils.writeVInt(dob, (Integer) t.get(0));

            dob.writeLong((Long) t.get(1));

        }

        try {

            bytes.set(dob.getData(), 0, dob.getLength());

            writer.write(NullWritable.get(), bytes);

        } catch (InterruptedException e) {

            e.printStackTrace();

        }

    }
But I get this exception:
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
recreate exception from backed error: java.io.IOException:
java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable is
not class org.apache.pig.impl.io.NullableText

And if I use a NullableText instead of a NullWritable, I get this other
exception:
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
recreate exception from backed error: java.io.IOException:
java.io.IOException: wrong value class: org.apache.hadoop.io.BytesWritable
is not class org.apache.pig.impl.io.NullableTuple

There must be something I am doing wrong in telling Pig the types of the
sequence file.

It must be a stupid problem but I don't see it.

Does anybody have a clue?
Thanks,
--
Gianmarco
+
Ashutosh Chauhan 2011-10-28, 17:15
+
Gianmarco De Francisci Mo... 2011-10-31, 11:28
+
Ashutosh Chauhan 2011-11-01, 00:55
+
Gianmarco De Francisci Mo... 2011-11-03, 17:24
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB