Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> OOM in mappers


Copy link to this message
-
Re: OOM in mappers
Looks like there is a bug that needs fixing. Can you open a jira with
the details? Please add information including - the io.sort.mb setting,
-Xmx for map task, information about udfs you are using.

As a workaround, you can spawn more map tasks - turn split combination
off and specify smaller split size .
For example -
-Dpig.splitCombination=false -Dmapred.max.split.size=33554432
Let me know if the workaround works for you.

-Thejas
On 10/11/11 7:47 AM, Shubham Chopra wrote:
> Hi Thejas,
>
> I am using 0.9. What I see is that the members POForeach and
> myPlans:ArrayList of PODemux seem to be keeping two deep copies of the same
> set of databags.
>
> Thanks,
> Shubham.
>
> On Mon, Oct 10, 2011 at 6:57 PM, Thejas Nair<[EMAIL PROTECTED]>  wrote:
>
>> What version of pig are you using ? You might want to try 0.9.1 .
>> This sounds like the issue described in - https://issues.apache.org/**
>> jira/browse/PIG-1815<https://issues.apache.org/jira/browse/PIG-1815>  .
>>
>> Thanks,
>> Thejas
>>
>>
>> On 10/10/11 2:22 PM, Shubham Chopra wrote:
>>
>>> The job I am trying to run performs some projections and aggregations. I
>>> see
>>> that maps continuously fail with an OOM with the following stack trace:
>>>
>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>         at org.apache.pig.data.**DefaultTuple.(DefaultTuple.**java:69)
>>>         at org.apache.pig.data.**BinSedesTuple.(BinSedesTuple.**java:82)
>>>         at org.apache.pig.data.**BinSedesTupleFactory.newTuple(**
>>> BinSedesTupleFactory.java:38)
>>>         at org.apache.pig.data.**BinInterSedes.readTuple(**
>>> BinInterSedes.java:109)
>>>         at org.apache.pig.data.**BinInterSedes.readDatum(**
>>> BinInterSedes.java:270)
>>>         at org.apache.pig.data.**BinInterSedes.readDatum(**
>>> BinInterSedes.java:251)
>>>         at org.apache.pig.data.**BinInterSedes.addColsToTuple(**
>>> BinInterSedes.java:556)
>>>         at org.apache.pig.data.**BinSedesTuple.readFields(**
>>> BinSedesTuple.java:64)
>>>         at org.apache.pig.impl.io.**PigNullableWritable.**readFields(**
>>> PigNullableWritable.java:114)
>>>         at org.apache.hadoop.io.**serializer.**WritableSerialization$**
>>> WritableDeserializer.**deserialize(**WritableSerialization.java:67)
>>>         at org.apache.hadoop.io.**serializer.**WritableSerialization$**
>>> WritableDeserializer.**deserialize(**WritableSerialization.java:40)
>>>         at org.apache.hadoop.mapreduce.**ReduceContext.nextKeyValue(**
>>> ReduceContext.java:116)
>>>         at org.apache.hadoop.mapreduce.**ReduceContext$ValueIterator.**
>>> next(ReduceContext.java:163)
>>>         at org.apache.pig.backend.hadoop.**executionengine.physicalLayer.*
>>> *relationalOperators.**POCombinerPackage.getNext(**
>>> POCombinerPackage.java:141)
>>>         at org.apache.pig.backend.hadoop.**executionengine.physicalLayer.*
>>> *relationalOperators.**POMultiQueryPackage.getNext(**
>>> POMultiQueryPackage.java:238)
>>>         at org.apache.pig.backend.hadoop.**executionengine.**
>>> mapReduceLayer.PigCombiner$**Combine.**processOnePackageOutput(**
>>> PigCombiner.java:171)
>>>         at org.apache.pig.backend.hadoop.**executionengine.**
>>> mapReduceLayer.PigCombiner$**Combine.reduce(PigCombiner.**java:162)
>>>         at org.apache.pig.backend.hadoop.**executionengine.**
>>> mapReduceLayer.PigCombiner$**Combine.reduce(PigCombiner.**java:51)
>>>         at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:176)
>>>         at org.apache.hadoop.mapred.Task$**NewCombinerRunner.combine(**
>>> Task.java:1222)
>>>         at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
>>> sortAndSpill(MapTask.java:**1265)
>>>         at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
>>> access$1800(MapTask.java:686)
>>>         at org.apache.hadoop.mapred.**MapTask$MapOutputBuffer$**
>>> SpillThread.run(MapTask.java:**1173)
>>>
>>>
>>> An analysis of the heapdump showed that apart from the io sort buffer, the
>>> remaining memory was being consumed almost in its entirety by
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB