Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> OOM in mappers


Copy link to this message
-
Re: OOM in mappers
What version of pig are you using ? You might want to try 0.9.1 .
This sounds like the issue described in -
https://issues.apache.org/jira/browse/PIG-1815 .

Thanks,
Thejas

On 10/10/11 2:22 PM, Shubham Chopra wrote:
> The job I am trying to run performs some projections and aggregations. I see
> that maps continuously fail with an OOM with the following stack trace:
>
> Error: java.lang.OutOfMemoryError: Java heap space
> at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:69)
> at org.apache.pig.data.BinSedesTuple.(BinSedesTuple.java:82)
> at org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:38)
> at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:109)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:556)
> at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
> at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
> at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage.getNext(POCombinerPackage.java:141)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMultiQueryPackage.getNext(POMultiQueryPackage.java:238)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:171)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:162)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
>
>
> An analysis of the heapdump showed that apart from the io sort buffer, the
> remaining memory was being consumed almost in its entirety by
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux
> (predominantly by an ArrayList, and a POForeach)
>
> Should the combiner usage be causing this high memory consumption? Is there
> any way to make the combiner run more frequently and aggregate the data more
> aggressively? The data I am using reduces by a factor of at least 1:10 after
> the combiner step and is neatly partitioned to maximize the effectiveness of
> combiner.
>
> Thanks,
> Shubham.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB