Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> OOM in mappers


Copy link to this message
-
Re: OOM in mappers
Looks like there is a bug that needs fixing. Can you open a jira with
the details? Please add information including - the io.sort.mb setting,
-Xmx for map task, information about udfs you are using.

As a workaround, you can spawn more map tasks - turn split combination
off and specify smaller split size .
For example -
-Dpig.splitCombination=false -Dmapred.max.split.size=33554432
Let me know if the workaround works for you.

-Thejas
On 10/11/11 7:47 AM, Shubham Chopra wrote:
> Hi Thejas,
>
> I am using 0.9. What I see is that the members POForeach and
> myPlans:ArrayList of PODemux seem to be keeping two deep copies of the same
> set of databags.
>
> Thanks,
> Shubham.
>
> On Mon, Oct 10, 2011 at 6:57 PM, Thejas Nair<[EMAIL PROTECTED]>  wrote:
>
>> What version of pig are you using ? You might want to try 0.9.1 .
>> This sounds like the issue described in - https://issues.apache.org/**
>> jira/browse/PIG-1815<https://issues.apache.org/jira/browse/PIG-1815>  .
>>
>> Thanks,
>> Thejas
>>
>>
>> On 10/10/11 2:22 PM, Shubham Chopra wrote:
>>
>>> The job I am trying to run performs some projections and aggregations. I
>>> see
>>> that maps continuously fail with an OOM with the following stack trace:
>>>
>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>         at org.apache.pig.data.**DefaultTuple.(DefaultTuple.**java:69)
>>>         at org.apache.pig.data.**BinSedesTuple.(BinSedesTuple.**java:82)
>>>         at org.apache.pig.data.**BinSedesTupleFactory.newTuple(**
>>> BinSedesTupleFactory.java:38)
>>>         at org.apache.pig.data.**BinInterSedes.readTuple(**
>>> BinInterSedes.java:109)
>>>         at org.apache.pig.data.**BinInterSedes.readDatum(**
>>> BinInterSedes.java:270)
>>>         at org.apache.pig.data.**BinInterSedes.readDatum(**
>>> BinInterSedes.java:251)
>>>         at org.apache.pig.data.**BinInterSedes.addColsToTuple(**
>>> BinInterSedes.java:556)
>>>         at org.apache.pig.data.**BinSedesTuple.readFields(**
>>> BinSedesTuple.java:64)
>>>         at org.apache.pig.impl.io.**PigNullableWritable.**readFields(**
>>> PigNullableWritable.java:114)
>>>         at org.apache.hadoop.io.**serializer.**WritableSerialization$**
>>> WritableDeserializer.**deserialize(**WritableSerialization.java:67)
>>>         at org.apache.hadoop.io.**serializer.**WritableSerialization$**
>>> WritableDeserializer.**deserialize(**WritableSerialization.java:40)
>>>         at org.apache.hadoop.mapreduce.**ReduceContext.nextKeyValue(**
>>> ReduceContext.java:116)
>>>         at org.apache.hadoop.mapreduce.**ReduceContext$ValueIterator.**
>>> next(ReduceContext.java:163)
>>>         at org.apache.pig.backend.hadoop.**executionengine.physicalLayer.*
>>> *relationalOperators.**POCombinerPackage.getNext(**
>>> POCombinerPackage.java:141)
>>>         at org.apache.pig.backend.hadoop.**executionengine.physicalLayer.*
>>> *relationalOperators.**POMultiQueryPackage.getNext(**
>>> POMultiQueryPackage.java:238)
>>>         at org.apache.pig.backend.hadoop.**executionengine.**
>>> mapReduceLayer.PigCombiner$**Combine.**processOnePackageOutput(**
>>> PigCombiner.java:171)
>>>         at org.apache.pig.backend.hadoop.**executionengine.**
>>> mapReduceLayer.PigCombiner$**Combine.reduce(PigCombiner.**java:162)
>>>         at org.apache.pig.backend.hadoop.**executionengine.**
>>> mapReduceLayer.PigCombiner$**Combine.reduce(PigCombiner.**java:51)
>>>         at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:176)
>>>         at org.apache.hadoop.mapred.Task$**NewCombinerRunner.combine(**
>>> Task.java:1222)
>>>         at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
>>> sortAndSpill(MapTask.java:**1265)
>>>         at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
>>> access$1800(MapTask.java:686)
>>>         at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer$**
>>> SpillThread.run(MapTask.java:**1173)
>>>
>>>
>>> An analysis of the heapdump showed that apart from the io sort buffer, the
>>> remaining memory was being consumed almost in its entirety by