Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> java.lang.OutOfMemoryError when using TOP udf


+
Ruslan Al-fakikh 2011-11-17, 14:13
+
Dmitriy Ryaboy 2011-11-17, 16:43
Copy link to this message
-
Re: java.lang.OutOfMemoryError when using TOP udf
according to the stack trace, the algebraic is not being used
it says
updateTop(Top.java:139)
exec(Top.java:116)

On 11/17/11, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> The top udf does not try to process all data in memory if the algebraic
> optimization can be applied. It does need to keep the topn numbers in memory
> of course. Can you confirm algebraic mode is used?
>
> On Nov 17, 2011, at 6:13 AM, "Ruslan Al-fakikh" <[EMAIL PROTECTED]>
> wrote:
>
>> Hey guys,
>>
>>
>>
>> I encounter java.lang.OutOfMemoryError when using TOP udf. It seems that
>> the
>> udf tries to process all data in memory.
>>
>> Is there a workaround for TOP? Or maybe there is some other way of getting
>> top results? I cannot use LIMIT since I need to 5% of data, not a constant
>> number of rows.
>>
>>
>>
>> I am using:
>>
>> Apache Pig version 0.8.1-cdh3u2 (rexported)
>>
>>
>>
>> The stack trace is:
>>
>> [2011-11-16 12:34:55] INFO  (CodecPool.java:128) - Got brand-new
>> decompressor
>>
>> [2011-11-16 12:34:55] INFO  (Merger.java:473) - Down to the last
>> merge-pass,
>> with 21 segments left of total size: 2057257173 bytes
>>
>> [2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - first
>> memory
>> handler call- Usage threshold init = 175308800(171200K) used >> 373454552(364701K) committed = 524288000(512000K) max = 524288000(512000K)
>>
>> [2011-11-16 12:36:22] INFO  (SpillableMemoryManager.java:167) - first
>> memory
>> handler call - Collection threshold init = 175308800(171200K) used >> 496500704(484863K) committed = 524288000(512000K) max = 524288000(512000K)
>>
>> [2011-11-16 12:37:28] INFO  (TaskLogsTruncater.java:69) - Initializing
>> logs'
>> truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>
>> [2011-11-16 12:37:28] FATAL (Child.java:318) - Error running child :
>> java.lang.OutOfMemoryError: Java heap space
>>
>>                at java.util.Arrays.copyOfRange(Arrays.java:3209)
>>
>>                at java.lang.String.<init>(String.java:215)
>>
>>                at
>> java.io.DataInputStream.readUTF(DataInputStream.java:644)
>>
>>                at
>> java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>
>>                at
>> org.apache.pig.data.BinInterSedes.readCharArray(BinInterSedes.java:210)
>>
>>                at
>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:333)
>>
>>                at
>> org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
>>
>>                at
>> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
>>
>>                at
>> org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
>>
>>                at
>> org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCach
>> edBag.java:237)
>>
>>                at org.apache.pig.builtin.TOP.updateTop(TOP.java:139)
>>
>>                at org.apache.pig.builtin.TOP.exec(TOP.java:116)
>>
>>                at org.apache.pig.builtin.TOP.exec(TOP.java:65)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
>> ors.POUserFunc.getNext(POUserFunc.java:245)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
>> ors.POUserFunc.getNext(POUserFunc.java:287)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POForEach.processPlan(POForEach.java:338)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POForEach.getNext(POForEach.java:290)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
>> .processInput(PhysicalOperator.java:276)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
>> ors.POForEach.getNext(POForEach.java:240)
>>
>>                at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Re
>> duce.runPipeline(PigMapReduce.java:434)
+
Dmitriy Ryaboy 2011-11-17, 20:07
+
Ruslan Al-Fakikh 2011-11-21, 14:11
+
Dmitriy Ryaboy 2011-11-21, 16:32
+
Ruslan Al-fakikh 2011-11-21, 17:10
+
Jonathan Coveney 2011-11-21, 18:22
+
pablomar 2011-11-21, 20:53
+
Jonathan Coveney 2011-11-21, 21:53
+
Dmitriy Ryaboy 2011-11-21, 22:20
+
Ruslan Al-fakikh 2011-11-22, 15:08
+
pablomar 2011-11-23, 03:10
+
Jonathan Coveney 2011-11-23, 07:45
+
Ruslan Al-fakikh 2011-11-24, 11:55
+
Ruslan Al-fakikh 2011-12-15, 14:57
+
Ruslan Al-fakikh 2011-12-16, 13:32
+
Dmitriy Ryaboy 2011-12-16, 20:15
+
Ruslan Al-fakikh 2011-12-22, 01:37
+
Ruslan Al-fakikh 2011-12-27, 15:48
+
Jonathan Coveney 2011-12-28, 19:18
+
Ruslan Al-fakikh 2012-01-06, 03:14
+
Jonathan Coveney 2012-01-06, 04:10
+
Ruslan Al-fakikh 2011-12-28, 22:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB