Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - java.lang.OutOfMemoryError when using TOP udf


Copy link to this message
-
Re: java.lang.OutOfMemoryError when using TOP udf
pablomar 2011-11-17, 17:59
according to the stack trace, the algebraic is not being used
it says
updateTop(Top.java:139)
exec(Top.java:116)

On 11/17/11, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> The top udf does not try to process all data in memory if the algebraic
> optimization can be applied. It does need to keep the topn numbers in memory
> of course. Can you confirm algebraic mode is used?
>
> On Nov 17, 2011, at 6:13 AM, "Ruslan Al-fakikh" <[EMAIL PROTECTED]>
> wrote:
>
>> Hey guys,
>>
>>
>>
>> I encounter java.lang.OutOfMemoryError when using TOP udf. It seems that
>> the
>> udf tries to process all data in memory.
>>
>> Is there a workaround for TOP? Or maybe there is some other way of getting
>> top results? I cannot use LIMIT since I need to 5% of data, not a constant
>> number of rows.
>>
>>
>>
>> I am using:
>>
>> Apache Pig version 0.8.1-cdh3u2 (rexported)
>>
>>
>>
>> The stack trace is:
>>
>> [2011-11-16 12:34:55] INFO  (CodecPool.java:128) - Got brand-new
>> decompressor
>>
>> [2011-11-16 12:34:55] INFO  (Merger.java:473) - Down to the last
>> merge-pass,
>> with 21 segments left of total size: 2057257173 bytes
>>
>> [2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - first
>> memory
>> handler call- Usage threshold init = 175308800(171200K) used >> 373454552(364701K) committed = 524288000(512000K) max = 524288000(512000K)
>>
>> [2011-11-16 12:36:22] INFO  (SpillableMemoryManager.java:167) - first
>> memory
>> handler call - Collection threshold init = 175308800(171200K) used >> 496500704(484863K) committed = 524288000(512000K) max = 524288000(512000K)
>>
>> [2011-11-16 12:37:28] INFO  (TaskLogsTruncater.java:69) - Initializing
>> logs'
>> truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>
>> [2011-11-16 12:37:28] FATAL (Child.java:318) - Error running child :
>> java.lang.OutOfMemoryError: Java heap space
>>
>>                at java.util.Arrays.copyOfRange(Arrays.java:3209)
>>
>>                at java.lang.String.<init>(String.java:215)
>>
>>                at
>> java.io.DataInputStream.readUTF(DataInputStream.java:644)
>>
>>                at
>> java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>
>>                at
>> org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java:210)
>>
>>                at
>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333)
>>
>>                at
>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
>>
>>                at
>> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
>>
>>                at
>> org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
>>
>>                at
>> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCach
>> edBag.java:237)
>>
>>                at org.apache.pig.builtin.TOP.updateTop(TOP.java:139)
>>
>>                at org.apache.pig.builtin.TOP.exec(TOP.java:116)
>>
>>                at org.apache.pig.builtin.TOP.exec(TOP.java:65)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
>> ors.POUserFunc.getNext(POUserFunc.java:245)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
>> ors.POUserFunc.getNext(POUserFunc.java:287)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POForEach.processPlan(POForEach.java:338)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POForEach.getNext(POForEach.java:290)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
>> .processInput(PhysicalOperator.java:276)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POForEach.getNext(POForEach.java:240)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
>> duce.runPipeline(PigMapReduce.java:434)