|
|
-
Re: Pig FLATTEN statementThejas Nair 2011-12-02, 00:12
I think the parse udf returns a schema with 5 columns, but all tuples
returned by the udf don't have the 5 columns. Can you check if the udf always returns tuple with 5 columns ? Thanks, Thejas On 11/29/11 1:47 AM, Daan Gerits wrote: > Hello everyone, > > I have some issues with the pig flatten statement as I receive several exceptions when trying to flatten a bag. > > I read in Jira and on the mailiinglists that other people had issues with flattening a bag which was embedded within a tuple, but that should have been solved in version 0.8.1 and 0.9.0. I've tried using 0.9.0 and 0.9.1, both giving me the same results. > > Any help is greatly appreciated, > > Daan Gerits > > ==================== Snippet Start ===================> > parserResults > FOREACH fetchResultsFlattened { > parsed = parse('GoogleSearchItems.xml', content); > GENERATE queryString, FLATTEN(parsed); > } > > DESCRIBE parserResults; > parserResults: {null::queryString: chararray,fields::id: chararray,fields::path: chararray,fields::selector: chararray,fields::type: chararray,fields::values: {(value: chararray)}} > > parserValues > FOREACH parserResults > GENERATE queryString, id, path, selector, type, FLATTEN(values); > > DESCRIBE parserValues; > parserValues: {null::queryString: chararray,fields::id: chararray,fields::path: chararray,fields::selector: chararray,fields::type: chararray,fields::values::value: chararray} > > DUMP parserValues; > java.lang.IndexOutOfBoundsException: Index: 5, Size: 2 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:158) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:575) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:256) > > ==================== Snippet End ===================> |