Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> OOM in mappers


Copy link to this message
-
Re: OOM in mappers
What version of pig are you using ? You might want to try 0.9.1 .
This sounds like the issue described in -
https://issues.apache.org/jira/browse/PIG-1815 .

Thanks,
Thejas

On 10/10/11 2:22 PM, Shubham Chopra wrote:
> The job I am trying to run performs some projections and aggregations. I see
> that maps continuously fail with an OOM with the following stack trace:
>
> Error: java.lang.OutOfMemoryError: Java heap space
> at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:69)
> at org.apache.pig.data.BinSedesTuple.(BinSedesTuple.java:82)
> at org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:38)
> at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:109)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:556)
> at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
> at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
> at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage.getNext(POCombinerPackage.java:141)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMultiQueryPackage.getNext(POMultiQueryPackage.java:238)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:171)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:162)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
>
>
> An analysis of the heapdump showed that apart from the io sort buffer, the
> remaining memory was being consumed almost in its entirety by
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux
> (predominantly by an ArrayList, and a POForeach)
>
> Should the combiner usage be causing this high memory consumption? Is there
> any way to make the combiner run more frequently and aggregate the data more
> aggressively? The data I am using reduces by a factor of at least 1:10 after
> the combiner step and is neatly partitioned to maximize the effectiveness of
> combiner.
>
> Thanks,
> Shubham.
>