Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - java.lang.OutOfMemoryError when using TOP udf


+
Ruslan Al-fakikh 2011-11-17, 14:13
+
Dmitriy Ryaboy 2011-11-17, 16:43
+
pablomar 2011-11-17, 17:59
+
Dmitriy Ryaboy 2011-11-17, 20:07
+
Ruslan Al-Fakikh 2011-11-21, 14:11
+
Dmitriy Ryaboy 2011-11-21, 16:32
+
Ruslan Al-fakikh 2011-11-21, 17:10
+
Jonathan Coveney 2011-11-21, 18:22
+
pablomar 2011-11-21, 20:53
+
Jonathan Coveney 2011-11-21, 21:53
+
Dmitriy Ryaboy 2011-11-21, 22:20
+
Ruslan Al-fakikh 2011-11-22, 15:08
Copy link to this message
-
Re: java.lang.OutOfMemoryError when using TOP udf
pablomar 2011-11-23, 03:10
just a guess .. could it be possible that the Bag is kept in memory instead
of being spilled to disk ?
browsing the code of InternalCachedBag, I saw:

private void init(int bagCount, float percent) {
        factory = TupleFactory.getInstance();
        mContents = new ArrayList<Tuple>
<http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/util/ArrayList.java.html>();

        long max = Runtime.getRuntime().maxMemory();
        maxMemUsage = (long)(((float)max * percent) / (float)bagCount);
        cacheLimit = Integer.MAX_VALUE;

        // set limit to 0, if memusage is 0 or really really small.
    // then all tuples are put into disk        if (maxMemUsage < 1) {
            cacheLimit = 0;
        }

        addDone = false;
    }

my guess is the cacheLimit was set to Integer.MAX_VALUE and it's trying to
keep all in memory when it is not big enough but not so small to have
cacheLimit reset to 0
On Tue, Nov 22, 2011 at 10:08 AM, Ruslan Al-fakikh <
[EMAIL PROTECTED]> wrote:

> Jonathan,
>
> I am running it on Prod cluster in MR mode, not locally. I started to see
> the issue when input size grew. A few days ago I found a workaround of
> putting this property:
> mapred.child.java.opts=-Xmx1024m
> But I think this is a temporary solution and the job will fail when the
> input size will grow again.
>
> Dmitriy,
>
> Thanks a lot for the investigation. I'll try it.
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
> Sent: 22 ноября 2011 г. 2:21
> To: [EMAIL PROTECTED]
> Subject: Re: java.lang.OutOfMemoryError when using TOP udf
>
> Ok so this:
>
> thirdLevelsTopVisitorsWithBots = FOREACH thirdLevelsByCategory {
>                                count = COUNT(thirdLevelsSummed);
>                                result = TOP( (int)(count * (double)
> ($THIRD_LEVELS_PERCENTAGE +
> $BOTS_PERCENTAGE) ), 3, thirdLevelsSummed);
>                                GENERATE FLATTEN(result);
> }
>
> requires "count" to be calculated before TOP can be applied. Since count
> can't be calculated until the reduce side, naturally, TOP can't start
> working on the map side (as it doesn't know its arguments yet).
>
> Try generating the counts * ($TLP + $BP) separately, joining them in (I am
> guessing you have no more than a few K categories -- in that case, you can
> do a replicated join), and then do group and TOP on.
>
> On Mon, Nov 21, 2011 at 1:53 PM, Jonathan Coveney <[EMAIL PROTECTED]>
> wrote:
> > You're right pablomar...hmm
> >
> > Ruslan: are you running this in mr mode on a cluster, or locally?
> >
> > I'm noticing this:
> > [2011-11-16 12:34:55] INFO  (SpillableMemoryManager.java:154) - first
> > memory handler call- Usage threshold init = 175308800(171200K) used > > 373454552(364701K) committed = 524288000(512000K) max > > 524288000(512000K)
> >
> > It looks like your max memory is 512MB. I've had issues with bag
> > spilling with less than 1GB allocated (-Xmx1024mb).
> >
> > 2011/11/21 pablomar <[EMAIL PROTECTED]>
> >
> >> i might be wrong, but it seems the error comes from
> >> while(itr.hasNext())
> >> not from the add to the queue
> >> so i don't think it is related to the number of elements in the queue
> >> ... maybe the field lenght?
> >>
> >> On 11/21/11, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
> >> > Internally, TOP is using a priority queue. It tries to be smart
> >> > about pulling off excess elements, but if you ask it for enough
> >> > elements, it
> >> can
> >> > blow up, because the priority queue is going to have n elements,
> >> > where n
> >> is
> >> > the ranking you want. This is consistent with the stack trace,
> >> > which died on updateTop which is when elements are added to the
> priority queue.
> >> >
> >> > Ruslan, how large are the limits you're setting? ie (int)(count *
> >> (double)
> >> > ($THIRD_LEVELS_PERCENTAGE + $BOTS_PERCENTAGE) )
> >> >
> >> > As far as TOP's implementation, I imagine you could get around the
> >> > issue
> >> by
+
Jonathan Coveney 2011-11-23, 07:45
+
Ruslan Al-fakikh 2011-11-24, 11:55
+
Ruslan Al-fakikh 2011-12-15, 14:57
+
Ruslan Al-fakikh 2011-12-16, 13:32
+
Dmitriy Ryaboy 2011-12-16, 20:15
+
Ruslan Al-fakikh 2011-12-22, 01:37
+
Ruslan Al-fakikh 2011-12-27, 15:48
+
Jonathan Coveney 2011-12-28, 19:18
+
Ruslan Al-fakikh 2012-01-06, 03:14
+
Jonathan Coveney 2012-01-06, 04:10
+
Ruslan Al-fakikh 2011-12-28, 22:21