|
|
-
Re: OOM in mappersThejas Nair 2011-10-10, 22:57
What version of pig are you using ? You might want to try 0.9.1 .
This sounds like the issue described in - https://issues.apache.org/jira/browse/PIG-1815 . Thanks, Thejas On 10/10/11 2:22 PM, Shubham Chopra wrote: > The job I am trying to run performs some projections and aggregations. I see > that maps continuously fail with an OOM with the following stack trace: > > Error: java.lang.OutOfMemoryError: Java heap space > at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:69) > at org.apache.pig.data.BinSedesTuple.(BinSedesTuple.java:82) > at org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:38) > at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:109) > at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270) > at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251) > at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:556) > at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64) > at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114) > at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) > at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) > at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116) > at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage.getNext(POCombinerPackage.java:141) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMultiQueryPackage.getNext(POMultiQueryPackage.java:238) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:171) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:162) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173) > > > An analysis of the heapdump showed that apart from the io sort buffer, the > remaining memory was being consumed almost in its entirety by > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux > (predominantly by an ArrayList, and a POForeach) > > Should the combiner usage be causing this high memory consumption? Is there > any way to make the combiner run more frequently and aggregate the data more > aggressively? The data I am using reduces by a factor of at least 1:10 after > the combiner step and is neatly partitioned to maximize the effectiveness of > combiner. > > Thanks, > Shubham. > |