Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> StoreFunc with Sequence file


Copy link to this message
-
Re: StoreFunc with Sequence file
Here is the pig script (I hope the formatting is kept),
I think I could reduce the script to a simple load/store and still have the
same problem, but I didn't have time to check it (I would need to rewrite
the StoreFunc).
FYI, my StoreFunc tries to write a SequenceFile<NullWritable,
BytesWritable>:

    @Override

    public OutputFormat<NullWritable, BytesWritable> getOutputFormat()
throws IOException {

        return new SequenceFileOutputFormat<NullWritable, BytesWritable>();

    }
rawtraces = LOAD '$log' AS (follower:chararray, action:int, time:long);

groupedtraces = GROUP rawtraces BY follower;

traces = FOREACH groupedtraces GENERATE group AS performer,
rawtraces.(action, time) AS t;
rawsn = LOAD '$network' AS (parent:chararray, child:chararray);

groupedsn = GROUP rawsn BY parent;

sn = FOREACH groupedsn GENERATE group AS parent, rawsn.(child) AS children;
join1 = JOIN traces BY performer, sn BY parent;
cleanJ1 = FOREACH join1 GENERATE traces::performer AS parent,
traces::t ASparentTraces, FLATTEN(sn::children)
AS child;

groupedJ1 = GROUP cleanJ1 BY child;

intermediate = FOREACH groupedJ1 GENERATE group AS child, cleanJ1.(parent,
parentTraces) AS legacy;
join2 = JOIN traces BY performer, intermediate BY child;

result = FOREACH join2 GENERATE traces::performer AS child, traces::t
ASchildTraces, intermediate::legacy
AS legacy;
STORE result INTO '$output' USING mypackage.pig.BinStorage();

And here is the stack trace:

java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.NullWritable is not class
org.apache.pig.impl.io.NullableText
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:255)
Caused by: java.io.IOException: wrong key class:
org.apache.hadoop.io.NullWritable is not class
org.apache.pig.impl.io.NullableText
at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:985)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74)
at mypackage.pig.BinStorage.putNext(BinStorage.java:75)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:462)
... 11 more
Cheers,
--
Gianmarco

On Tue, Nov 1, 2011 at 01:55, Ashutosh Chauhan <[EMAIL PROTECTED]> wrote:

> Actually what I said was not entirely correct. Per Daniel, Pig's load/store
> func are designed to work with InputFormat/OutputFormat which works on
> <ComparableWritable,Writable> so what you are seeing is not expected. Can
> you paste the pig script you are using and the detailed stack trace. You
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB