Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Java heap error


Copy link to this message
-
Re: Java heap error
Ashutosh Chauhan 2010-07-08, 20:59
Syed,

You are likely hit by https://issues.apache.org/jira/browse/PIG-1442 .
Your query and stacktrace look very similar to the one in the jira
ticket. This may get fixed by 0.8 release.

Ashutosh

On Thu, Jul 8, 2010 at 13:42, Syed Wasti <[EMAIL PROTECTED]> wrote:
> Sorry about the delay, was held with different things.
> Here is the script and the errors below;
>
> AA = LOAD 'table1' USING PigStorage('\t') as
> (ID,b,c,d,e,f,g,h,i,j,k,l,m,n,o);
>
> AB = FOREACH AA GENERATE ID, e, f, n,o;
>
> AC = FILTER AB BY o == 1;
>
> AD = GROUP AC BY (ID, b);
>
> AE = FOREACH AD { A = DISTINCT AC.d;
>        GENERATE group.ID, (chararray) 'S' AS type, group.b, (int)
> COUNT_STAR(filt) AS cnt, (int) COUNT(A) AS cnt_distinct; }
>
> The same steps are repeated to load 5 different tables and then a UNION is
> done on them.
>
> Final_res = UNION AE, AF, AG, AH, AI;
>
> The actual number of columns will be 15 here I am showing with one table.
>
> Final_table =   FOREACH Final_res GENERATE ID,
>                (type == 'S' AND b == 1?cnt:0) AS 12_tmp,
>                (type == 'S' AND b == 2?cnt:0) AS 13_tmp,
>                (type == 'S' AND b == 1?cnt_distinct:0) AS 12_distinct_tmp,
>                (type == 'S' AND b == 2?cnt_distinct:0) AS 13_distinct_tmp;
>
> It works fine until here, it is only after adding this last part of the
> query it starts throwing heap errors.
>
> grp_id =    GROUP Final_table BY ID;
>
> Final_data = FOREACH grp_reg_id GENERATE group AS ID
> SUM(Final_table.12_tmp), SUM(Final_table.13_tmp),
> SUM(Final_table.12_distinct_tmp), SUM(Final_table.13_distinct_tmp);
>
> STORE Final_data;
>
>
> Error: java.lang.OutOfMemoryError: Java heap space
>  at java.util.ArrayList.(ArrayList.java:112)
>  at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:63)
>  at
> org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35
> )
>  at
> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:55)
>  at
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>  at
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:130)
>  at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:289)
>  at
> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.ja
> va:114)
>  at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
> eserialize(WritableSerialization.java:67)
>  at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
> eserialize(WritableSerialization.java:40)
>  at
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:11
> 6)
>  at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>  at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1217)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
> 227)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:64
> 8)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
> a:1135)
>
>
> Error: java.lang.OutOfMemoryError: Java heap space
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
> ors.POCombinerPackage.createDataBag(POCombinerPackage.java:139)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
> ors.POCombinerPackage.getNext(POCombinerPackage.java:148)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
> bine.processOnePackageOutput(PigCombiner.java:168)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
> bine.reduce(PigCombiner.java:159)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
> bine.reduce(PigCombiner.java:50)
>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>  at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1217)