Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
Hi Rohini,

>From your query it looks like you are already grouping it by TYPE, so not
sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION" and
vice-versa. Your output is already broken down by TYPE.

Thanks,
Prashant

On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <[EMAIL PROTECTED]> wrote:

> Thanks for the suggestion Prashant. However, that will not work in my case.
>
> If I filter before the group and include the new field in group as you
> suggested, I get the individual counts broken by the select field
> critieria. However, I want the totals also without taking the select fields
> into account. That is why I took the approach I described in my earlier
> emails.
>
> Thanks
> Rohini
>
> On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <[EMAIL PROTECTED]
> >wrote:
>
> > Please pull your FILTER out of GROUP BY and do it earlier
> > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> >
> > In this case, you could use a FILTER followed by a bincond to introduce a
> > new field "employerOrLocation", then do a group by and include the new
> > field in the GROUP BY clause.
> >
> > Thanks,
> > Prashant
> >
> > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <[EMAIL PROTECTED]> wrote:
> >
> > > My input data size is 9GB and I am using 20 machines.
> > >
> > > My grouped criteria has two cases so I want 1) counts by the criteria I
> > > have grouped 2) counts of the two inviduals cases in each of my group.
> > >
> > > So my script in detail is:
> > >
> > > counts = FOREACH grouped {
> > >                     selectedFields1 = FILTER rawItems  BY
> > type="EMPLOYER";
> > >                   selectedFields2 = FILTER rawItems  BY
> type="LOCATION";
> > >                      GENERATE
> > >                             FLATTEN(group) as (item1, item2, item3,
> > type) ,
> > >                               SUM(selectedFields1.count) as
> > > selectFields1Count,
> > >                              SUM(selectedFields2.count) as
> > > selectFields2Count,
> > >                             COUNT(rawItems) as groupCriteriaCount
> > >
> > >              }
> > >
> > > Is there a way way to do this?
> > >
> > >
> > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > you are not doing grouping followed by counting. You are doing
> grouping
> > > > followed by filtering followed by counting.
> > > > Try filtering before grouping.
> > > >
> > > > D
> > > >
> > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a pig script which does a simple GROUPing followed by
> couting
> > > and
> > > > I
> > > > > get this error.  My data is certaining not that big for it to cause
> > > this
> > > > > out of memory error. Is there a chance that this is because of some
> > > bug ?
> > > > > Did any one come across this kind of error before?
> > > > >
> > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > >
> > > > > My script:
> > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > >
> > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > > >                     GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                              SUM(selectedFields.count) as count
> > > > >
> > > > >              }
> > > > >
> > > > > Stack Trace:
> > > > >
> > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> (main):
> > > > Error
> > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > exceeded
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB