Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Correlation function out of memory error


Copy link to this message
-
Re: Correlation function out of memory error
Hi, Houssam:
can you try change your HDFS block size smaller and also 'SET
pig.noSplitCombination false;' in Pig? (so that number of mapper will be
equal to number of file block)

The OOM seems happen in COR function when it is trying to combine different
data chunk together in maper. So more mapper may help. I will try it when I
got a cluster to play with.

Johnny
On Fri, Feb 22, 2013 at 2:18 PM, Johnny Zhang <[EMAIL PROTECTED]> wrote:

> Hi, Houssam:
> What's the error in your pig log file? I were trying to reproduce it with
> 1000 rows, 500 columns.
> A = load 'random.txt' using PigStorage(':') as
> (f1:double,f2:double,.........,f500:double);
> B = group A all;
> D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
> dump D;
>
> The exception in pig log file is
> Backend error message
> ---------------------
> Error: java.lang.OutOfMemoryError: *GC overhead limit exceeded*
> at java.lang.Double.valueOf(Double.java:492)
>  at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:390)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
>  at
> org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
>  at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
> at
> org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
>  at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
>  at
> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
> at org.apache.pig.builtin.COR.combine(COR.java:258)
>  at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
> at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
>  at org.apache.pig.backend.hadoop.executionengine.physi
>
> Backend error message
> ---------------------
> Error: java.lang.OutOfMemoryError: Java heap space
>  at java.lang.Double.valueOf(Double.java:492)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:390)
>  at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
> at
> org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
>  at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
>  at
> org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
>  at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
> at
> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
>  at org.apache.pig.builtin.COR.combine(COR.java:258)
> at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
>  at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.ex
>
> Backend error message
> ---------------------
> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.util.ArrayList.<init>(ArrayList.java:112)
>  at org.apache.pig.data.DefaultTuple.<init>(DefaultTuple.java:67)
> at org.apache.pig.data.BinSedesTuple.<init>(BinSedesTuple.java:67)
>  at
> org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:38)
> at
> org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:142)
>  at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
>  at
> org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
> at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
>  at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
> at
> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
>  at org.apache.pig.builtin.COR.combine(COR.java:258)