Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> java.lang.OutOfMemoryError when using TOP udf


Copy link to this message
-
Re: java.lang.OutOfMemoryError when using TOP udf
Hey Dmitriy,

I attached the script. It is not a plain-pig script, because I make
some preprocessing before submitting it to cluster, but the general
idea of what I submit is clear.

Thanks in advance!

On Fri, Nov 18, 2011 at 12:07 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Ok, so it's something in the rest of the script that's causing this to
> happen. Ruslan, if you send your script, I can probably figure out why
> (usually, it's using another, non-agebraic udf in your foreach, or for
> pig 0.8, generating a constant in the foreach).
>
> D
>
> On Thu, Nov 17, 2011 at 9:59 AM, pablomar
> <[EMAIL PROTECTED]> wrote:
>> according to the stack trace, the algebraic is not being used
>> it says
>> updateTop(Top.java:139)
>> exec(Top.java:116)
>>
>> On 11/17/11, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>>> The top udf does not try to process all data in memory if the algebraic
>>> optimization can be applied. It does need to keep the topn numbers in memory
>>> of course. Can you confirm algebraic mode is used?
>>>
>>> On Nov 17, 2011, at 6:13 AM, "Ruslan Al-fakikh" <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> Hey guys,
>>>>
>>>>
>>>>
>>>> I encounter java.lang.OutOfMemoryError when using TOP udf. It seems that
>>>> the
>>>> udf tries to process all data in memory.
>>>>
>>>> Is there a workaround for TOP? Or maybe there is some other way of getting
>>>> top results? I cannot use LIMIT since I need to 5% of data, not a constant
>>>> number of rows.
>>>>
>>>>
>>>>
>>>> I am using:
>>>>
>>>> Apache Pig version 0.8.1-cdh3u2 (rexported)
>>>>
>>>>
>>>>
>>>> The stack trace is:
>>>>
>>>> [2011-11-16 12:34:55] INFO  (CodecPool.java:128) - Got brand-new
>>>> decompressor
>>>>
>>>> [2011-11-16 12:34:55] INFO  (Merger.java:473) - Down to the last
>>>> merge-pass,
>>>> with 21 segments left of total size: 2057257173 bytes
>>>>
>>>> [2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - first
>>>> memory
>>>> handler call- Usage threshold init = 175308800(171200K) used >>>> 373454552(364701K) committed = 524288000(512000K) max = 524288000(512000K)
>>>>
>>>> [2011-11-16 12:36:22] INFO  (SpillableMemoryManager.java:167) - first
>>>> memory
>>>> handler call - Collection threshold init = 175308800(171200K) used >>>> 496500704(484863K) committed = 524288000(512000K) max = 524288000(512000K)
>>>>
>>>> [2011-11-16 12:37:28] INFO  (TaskLogsTruncater.java:69) - Initializing
>>>> logs'
>>>> truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>
>>>> [2011-11-16 12:37:28] FATAL (Child.java:318) - Error running child :
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>
>>>>                at java.util.Arrays.copyOfRange(Arrays.java:3209)
>>>>
>>>>                at java.lang.String.<init>(String.java:215)
>>>>
>>>>                at
>>>> java.io.DataInputStream.readUTF(DataInputStream.java:644)
>>>>
>>>>                at
>>>> java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java:210)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
>>>>
>>>>                at
>>>> org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
>>>>
>>>>                at
>>>> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCach
>>>> edBag.java:237)
>>>>
>>>>                at org.apache.pig.builtin.TOP.updateTop(TOP.java:139)
>>>>
>>>>                at org.apache.pig.builtin.TOP.exec(TOP.java:116)
>>>>
>>>>                at org.apache.pig.builtin.TOP.exec(TOP.java:65)
>>>>
>>>>                at
>>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
>>>> ors.POUserFunc.getNext(POUserFunc.java:245)
>>>>
>>>>                at

Best Regards,
Ruslan Al-Fakikh
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB