Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Java heap error


Copy link to this message
-
Re: Java heap error
>From the 2nd stack trace it looks like the combiner did not get disabled . You can verify that by looking at MapReduce plan in explain output.
It looks like for some reason the system property 'pig.exec.nocombiner' is not getting set to 'true' .

Can you send the other pig script that errors out with "Error: GC overhead limit exceeded" ?

-Thejas
On 7/27/10 11:27 PM, "Syed Wasti" <[EMAIL PROTECTED]> wrote:

Thank you Thejas for the response.
I want to share my feedback after trying all the recommended options.
Tried Increasing the heap size, built pig from the trunk and disabled the combiner by setting the property you recommended. All this did not work and still seeing the same errors, only way which is working for me is using the UDF I created.
Another case where its errors out with "Error: GC overhead limit exceeded" I noticed is in the recuded jobs when it is in the state of copying map outputs. It just hangs out there for a long time (over 30mins) and finally errors out.
I tried changing some parameters which I thought should be related but didnt help. Do you think this should be related to the newly created jira or would you recommend any properties that I should try.

If it helps, I am pasting the stack trace of my map job failures when running the script with disabled combiner. Thanks.

Regards
Syed Wasti
Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.ArrayList.(ArrayList.java:112)
    at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:60)
    at org.apache.pig.data.BinSedesTuple.(BinSedesTuple.java:66)
    at org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:37)
    at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:100)
    at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:267)
    at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:250)
    at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:568)
    at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:48)
    at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
    at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
    at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.AbstractList.iterator(AbstractList.java:273)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:148)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:203)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:262)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:259)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:184)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:162)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
    at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB