Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - workaround for  java.lang.OutOfMemoryError: Java heap space?


+
william.dowling@... 2011-06-10, 18:15
Copy link to this message
-
Re: workaround for  java.lang.OutOfMemoryError: Java heap space?
Thejas M Nair 2011-06-10, 18:50
I have seen this happen when there are very large number of distinct values
for a set of group keys. When combiner gets used, input records for reduce
task already has partial distinct bags, and this can result in large records
which cause MR to run out of memory trying to load the records.

You can modify the query the way its mentioned in comemnt#1 in -
https://issues.apache.org/jira/browse/PIG-1846

Or you can adding following to your script to disable combiner -

set pig.exec.nocombiner true;

Thanks,
Thejas
On 6/10/11 11:15 AM, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:

> I have a pig script that is working well for small test data sets but fails on
> a run over realistic-sized data. Logs show
>   INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201106061024_0331 has failed!
>   S
>   job_201106061024_0331   CitedItemsGrpByDocId,DedupTCPerDocId
> GROUP_BY,COMBINER       Message: Job failed!
>   S
>  attempt_201106061024_0331_m_000198_0  [S]   Error:
> java.lang.OutOfMemoryError: Java heap space
>   and similar same for all attempts at a few of the other (many) map tasks for
> this job.
>
> I believe  this job corresponds to these lines in my pig script:
>
>  CitedItemsGrpByDocId = group CitedItems by citeddocid;
>  DedupTCPerDocId >      foreach CitedItemsGrpByDocId {
>          CitingDocids =  CitedItems.citingdocid;
>          UniqCitingDocids = distinct CitingDocids;
>          generate group, COUNT(UniqCitingDocids) as tc;
>       };
>
> I tried increasing mapred.child.java.opts but the job failed in a setup stage
> with
>   Error occurred during initialization of VM
>   Could not reserve enough space for object heap
>
> Are there job configurations/parameters for Hadoop or pig I can set to get
> around this? Is there a Pig Latin circumlocution, or better way to express
> what I want, that is not as memory-hungry?
>
> Thank in advance,
>
> Will
>
> William F Dowling
> Sr Technical Specialist, Software Engineering
>
>
>
--
+
william.dowling@... 2011-06-10, 19:57