Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Java heap error


Copy link to this message
-
Re: Java heap error
Syed,

You are likely hit by https://issues.apache.org/jira/browse/PIG-1442 .
Your query and stacktrace look very similar to the one in the jira
ticket. This may get fixed by 0.8 release.

Ashutosh

On Thu, Jul 8, 2010 at 13:42, Syed Wasti <[EMAIL PROTECTED]> wrote:
> Sorry about the delay, was held with different things.
> Here is the script and the errors below;
>
> AA = LOAD 'table1' USING PigStorage('\t') as
> (ID,b,c,d,e,f,g,h,i,j,k,l,m,n,o);
>
> AB = FOREACH AA GENERATE ID, e, f, n,o;
>
> AC = FILTER AB BY o == 1;
>
> AD = GROUP AC BY (ID, b);
>
> AE = FOREACH AD { A = DISTINCT AC.d;
>        GENERATE group.ID, (chararray) 'S' AS type, group.b, (int)
> COUNT_STAR(filt) AS cnt, (int) COUNT(A) AS cnt_distinct; }
>
> The same steps are repeated to load 5 different tables and then a UNION is
> done on them.
>
> Final_res = UNION AE, AF, AG, AH, AI;
>
> The actual number of columns will be 15 here I am showing with one table.
>
> Final_table =   FOREACH Final_res GENERATE ID,
>                (type == 'S' AND b == 1?cnt:0) AS 12_tmp,
>                (type == 'S' AND b == 2?cnt:0) AS 13_tmp,
>                (type == 'S' AND b == 1?cnt_distinct:0) AS 12_distinct_tmp,
>                (type == 'S' AND b == 2?cnt_distinct:0) AS 13_distinct_tmp;
>
> It works fine until here, it is only after adding this last part of the
> query it starts throwing heap errors.
>
> grp_id =    GROUP Final_table BY ID;
>
> Final_data = FOREACH grp_reg_id GENERATE group AS ID
> SUM(Final_table.12_tmp), SUM(Final_table.13_tmp),
> SUM(Final_table.12_distinct_tmp), SUM(Final_table.13_distinct_tmp);
>
> STORE Final_data;
>
>
> Error: java.lang.OutOfMemoryError: Java heap space
>  at java.util.ArrayList.(ArrayList.java:112)
>  at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:63)
>  at
> org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35
> )
>  at
> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:55)
>  at
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>  at
> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:130)
>  at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:289)
>  at
> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.ja
> va:114)
>  at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
> eserialize(WritableSerialization.java:67)
>  at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
> eserialize(WritableSerialization.java:40)
>  at
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:11
> 6)
>  at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>  at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1217)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
> 227)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:64
> 8)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
> a:1135)
>
>
> Error: java.lang.OutOfMemoryError: Java heap space
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
> ors.POCombinerPackage.createDataBag(POCombinerPackage.java:139)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
> ors.POCombinerPackage.getNext(POCombinerPackage.java:148)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
> bine.processOnePackageOutput(PigCombiner.java:168)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
> bine.reduce(PigCombiner.java:159)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Com
> bine.reduce(PigCombiner.java:50)
>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>  at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1217)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB