Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
Hi Rohini,

>From your query it looks like you are already grouping it by TYPE, so not
sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION" and
vice-versa. Your output is already broken down by TYPE.

Thanks,
Prashant

On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <[EMAIL PROTECTED]> wrote:

> Thanks for the suggestion Prashant. However, that will not work in my case.
>
> If I filter before the group and include the new field in group as you
> suggested, I get the individual counts broken by the select field
> critieria. However, I want the totals also without taking the select fields
> into account. That is why I took the approach I described in my earlier
> emails.
>
> Thanks
> Rohini
>
> On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <[EMAIL PROTECTED]
> >wrote:
>
> > Please pull your FILTER out of GROUP BY and do it earlier
> > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> >
> > In this case, you could use a FILTER followed by a bincond to introduce a
> > new field "employerOrLocation", then do a group by and include the new
> > field in the GROUP BY clause.
> >
> > Thanks,
> > Prashant
> >
> > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <[EMAIL PROTECTED]> wrote:
> >
> > > My input data size is 9GB and I am using 20 machines.
> > >
> > > My grouped criteria has two cases so I want 1) counts by the criteria I
> > > have grouped 2) counts of the two inviduals cases in each of my group.
> > >
> > > So my script in detail is:
> > >
> > > counts = FOREACH grouped {
> > >                     selectedFields1 = FILTER rawItems  BY
> > type="EMPLOYER";
> > >                   selectedFields2 = FILTER rawItems  BY
> type="LOCATION";
> > >                      GENERATE
> > >                             FLATTEN(group) as (item1, item2, item3,
> > type) ,
> > >                               SUM(selectedFields1.count) as
> > > selectFields1Count,
> > >                              SUM(selectedFields2.count) as
> > > selectFields2Count,
> > >                             COUNT(rawItems) as groupCriteriaCount
> > >
> > >              }
> > >
> > > Is there a way way to do this?
> > >
> > >
> > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > you are not doing grouping followed by counting. You are doing
> grouping
> > > > followed by filtering followed by counting.
> > > > Try filtering before grouping.
> > > >
> > > > D
> > > >
> > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a pig script which does a simple GROUPing followed by
> couting
> > > and
> > > > I
> > > > > get this error.  My data is certaining not that big for it to cause
> > > this
> > > > > out of memory error. Is there a chance that this is because of some
> > > bug ?
> > > > > Did any one come across this kind of error before?
> > > > >
> > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > >
> > > > > My script:
> > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > >
> > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > > >                     GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                              SUM(selectedFields.count) as count
> > > > >
> > > > >              }
> > > > >
> > > > > Stack Trace:
> > > > >
> > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> (main):
> > > > Error
> > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > exceeded
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)