Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> workaround for  java.lang.OutOfMemoryError: Java heap space?

Copy link to this message
workaround for  java.lang.OutOfMemoryError: Java heap space?
I have a pig script that is working well for small test data sets but fails on a run over realistic-sized data. Logs show
  INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201106061024_0331 has failed!
  job_201106061024_0331   CitedItemsGrpByDocId,DedupTCPerDocId    GROUP_BY,COMBINER       Message: Job failed!
 attempt_201106061024_0331_m_000198_0  […]   Error: java.lang.OutOfMemoryError: Java heap space
  and similar same for all attempts at a few of the other (many) map tasks for this job.

I believe  this job corresponds to these lines in my pig script:

 CitedItemsGrpByDocId = group CitedItems by citeddocid;
 DedupTCPerDocId      foreach CitedItemsGrpByDocId {
         CitingDocids =  CitedItems.citingdocid;
         UniqCitingDocids = distinct CitingDocids;
         generate group, COUNT(UniqCitingDocids) as tc;

I tried increasing mapred.child.java.opts but the job failed in a setup stage with
  Error occurred during initialization of VM
  Could not reserve enough space for object heap

Are there job configurations/parameters for Hadoop or pig I can set to get around this? Is there a Pig Latin circumlocution, or better way to express what I want, that is not as memory-hungry?

Thank in advance,


William F Dowling
Sr Technical Specialist, Software Engineering