|
|
-
Re: OOM in mappersThejas Nair 2011-10-11, 21:02
Looks like there is a bug that needs fixing. Can you open a jira with
the details? Please add information including - the io.sort.mb setting, -Xmx for map task, information about udfs you are using. As a workaround, you can spawn more map tasks - turn split combination off and specify smaller split size . For example - -Dpig.splitCombination=false -Dmapred.max.split.size=33554432 Let me know if the workaround works for you. -Thejas On 10/11/11 7:47 AM, Shubham Chopra wrote: > Hi Thejas, > > I am using 0.9. What I see is that the members POForeach and > myPlans:ArrayList of PODemux seem to be keeping two deep copies of the same > set of databags. > > Thanks, > Shubham. > > On Mon, Oct 10, 2011 at 6:57 PM, Thejas Nair<[EMAIL PROTECTED]> wrote: > >> What version of pig are you using ? You might want to try 0.9.1 . >> This sounds like the issue described in - https://issues.apache.org/** >> jira/browse/PIG-1815<https://issues.apache.org/jira/browse/PIG-1815> . >> >> Thanks, >> Thejas >> >> >> On 10/10/11 2:22 PM, Shubham Chopra wrote: >> >>> The job I am trying to run performs some projections and aggregations. I >>> see >>> that maps continuously fail with an OOM with the following stack trace: >>> >>> Error: java.lang.OutOfMemoryError: Java heap space >>> at org.apache.pig.data.**DefaultTuple.(DefaultTuple.**java:69) >>> at org.apache.pig.data.**BinSedesTuple.(BinSedesTuple.**java:82) >>> at org.apache.pig.data.**BinSedesTupleFactory.newTuple(** >>> BinSedesTupleFactory.java:38) >>> at org.apache.pig.data.**BinInterSedes.readTuple(** >>> BinInterSedes.java:109) >>> at org.apache.pig.data.**BinInterSedes.readDatum(** >>> BinInterSedes.java:270) >>> at org.apache.pig.data.**BinInterSedes.readDatum(** >>> BinInterSedes.java:251) >>> at org.apache.pig.data.**BinInterSedes.addColsToTuple(** >>> BinInterSedes.java:556) >>> at org.apache.pig.data.**BinSedesTuple.readFields(** >>> BinSedesTuple.java:64) >>> at org.apache.pig.impl.io.**PigNullableWritable.**readFields(** >>> PigNullableWritable.java:114) >>> at org.apache.hadoop.io.**serializer.**WritableSerialization$** >>> WritableDeserializer.**deserialize(**WritableSerialization.java:67) >>> at org.apache.hadoop.io.**serializer.**WritableSerialization$** >>> WritableDeserializer.**deserialize(**WritableSerialization.java:40) >>> at org.apache.hadoop.mapreduce.**ReduceContext.nextKeyValue(** >>> ReduceContext.java:116) >>> at org.apache.hadoop.mapreduce.**ReduceContext$ValueIterator.** >>> next(ReduceContext.java:163) >>> at org.apache.pig.backend.hadoop.**executionengine.physicalLayer.* >>> *relationalOperators.**POCombinerPackage.getNext(** >>> POCombinerPackage.java:141) >>> at org.apache.pig.backend.hadoop.**executionengine.physicalLayer.* >>> *relationalOperators.**POMultiQueryPackage.getNext(** >>> POMultiQueryPackage.java:238) >>> at org.apache.pig.backend.hadoop.**executionengine.** >>> mapReduceLayer.PigCombiner$**Combine.**processOnePackageOutput(** >>> PigCombiner.java:171) >>> at org.apache.pig.backend.hadoop.**executionengine.** >>> mapReduceLayer.PigCombiner$**Combine.reduce(PigCombiner.**java:162) >>> at org.apache.pig.backend.hadoop.**executionengine.** >>> mapReduceLayer.PigCombiner$**Combine.reduce(PigCombiner.**java:51) >>> at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:176) >>> at org.apache.hadoop.mapred.Task$**NewCombinerRunner.combine(** >>> Task.java:1222) >>> at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.** >>> sortAndSpill(MapTask.java:**1265) >>> at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.** >>> access$1800(MapTask.java:686) >>> at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer$** >>> SpillThread.run(MapTask.java:**1173) >>> >>> >>> An analysis of the heapdump showed that apart from the io sort buffer, the >>> remaining memory was being consumed almost in its entirety by |