Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
Please pull your FILTER out of GROUP BY and do it earlier
http://pig.apache.org/docs/r0.9.1/perf.html#filter

In this case, you could use a FILTER followed by a bincond to introduce a
new field "employerOrLocation", then do a group by and include the new
field in the GROUP BY clause.

Thanks,
Prashant

On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <[EMAIL PROTECTED]> wrote:

> My input data size is 9GB and I am using 20 machines.
>
> My grouped criteria has two cases so I want 1) counts by the criteria I
> have grouped 2) counts of the two inviduals cases in each of my group.
>
> So my script in detail is:
>
> counts = FOREACH grouped {
>                     selectedFields1 = FILTER rawItems  BY type="EMPLOYER";
>                   selectedFields2 = FILTER rawItems  BY type="LOCATION";
>                      GENERATE
>                             FLATTEN(group) as (item1, item2, item3, type) ,
>                               SUM(selectedFields1.count) as
> selectFields1Count,
>                              SUM(selectedFields2.count) as
> selectFields2Count,
>                             COUNT(rawItems) as groupCriteriaCount
>
>              }
>
> Is there a way way to do this?
>
>
> On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
>
> > you are not doing grouping followed by counting. You are doing grouping
> > followed by filtering followed by counting.
> > Try filtering before grouping.
> >
> > D
> >
> > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > I have a pig script which does a simple GROUPing followed by couting
> and
> > I
> > > get this error.  My data is certaining not that big for it to cause
> this
> > > out of memory error. Is there a chance that this is because of some
> bug ?
> > > Did any one come across this kind of error before?
> > >
> > > I am using pig 0.9.1 with hadoop 0.20.205
> > >
> > > My script:
> > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > >
> > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > >
> > > counts = FOREACH grouped {
> > >                     selectedFields = FILTER rawItems  BY
> type="EMPLOYER";
> > >                     GENERATE
> > >                             FLATTEN(group) as (item1, item2, item3,
> > type) ,
> > >                              SUM(selectedFields.count) as count
> > >
> > >              }
> > >
> > > Stack Trace:
> > >
> > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main):
> > Error
> > > running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > >        at
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB