Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig FLATTEN statement


Copy link to this message
-
Re: Pig FLATTEN statement
I think the parse udf returns a schema with 5 columns, but all tuples
returned by the udf don't have the 5 columns.
Can you check if the udf always returns tuple with 5 columns ?

Thanks,
Thejas
On 11/29/11 1:47 AM, Daan Gerits wrote:
> Hello everyone,
>
> I have some issues with the pig flatten statement as I receive several exceptions when trying to flatten a bag.
>
> I read in Jira and on the mailiinglists that other people had issues with flattening a bag which was embedded within a tuple, but that should have been solved in version 0.8.1 and 0.9.0. I've tried using 0.9.0 and 0.9.1, both giving me the same results.
>
> Any help is greatly appreciated,
>
> Daan Gerits
>
> ==================== Snippet Start ===================>
> parserResults >      FOREACH fetchResultsFlattened {
>          parsed = parse('GoogleSearchItems.xml', content);
>          GENERATE queryString, FLATTEN(parsed);
>      }
>
> DESCRIBE parserResults;
> parserResults: {null::queryString: chararray,fields::id: chararray,fields::path: chararray,fields::selector: chararray,fields::type: chararray,fields::values: {(value: chararray)}}
>
> parserValues >      FOREACH parserResults
>      GENERATE queryString, id, path, selector, type, FLATTEN(values);
>
> DESCRIBE parserValues;
> parserValues: {null::queryString: chararray,fields::id: chararray,fields::path: chararray,fields::selector: chararray,fields::type: chararray,fields::values::value: chararray}
>
> DUMP parserValues;
> java.lang.IndexOutOfBoundsException: Index: 5, Size: 2
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:158)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:575)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:256)
>
> ==================== Snippet End ===================>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB