Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
Sure I can do that. Isn't this something that should be done already? Or
does it not work if the filter is working on a field that is part of the
group?

On Wed, Mar 21, 2012 at 11:02 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Prashant, mind filing a jira with this example? Technically, this is
> something we could do automatically.
>
> On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <[EMAIL PROTECTED]
> >wrote:
>
> > Please pull your FILTER out of GROUP BY and do it earlier
> > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> >
> > In this case, you could use a FILTER followed by a bincond to introduce a
> > new field "employerOrLocation", then do a group by and include the new
> > field in the GROUP BY clause.
> >
> > Thanks,
> > Prashant
> >
> > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <[EMAIL PROTECTED]> wrote:
> >
> > > My input data size is 9GB and I am using 20 machines.
> > >
> > > My grouped criteria has two cases so I want 1) counts by the criteria I
> > > have grouped 2) counts of the two inviduals cases in each of my group.
> > >
> > > So my script in detail is:
> > >
> > > counts = FOREACH grouped {
> > >                     selectedFields1 = FILTER rawItems  BY
> > type="EMPLOYER";
> > >                   selectedFields2 = FILTER rawItems  BY
> type="LOCATION";
> > >                      GENERATE
> > >                             FLATTEN(group) as (item1, item2, item3,
> > type) ,
> > >                               SUM(selectedFields1.count) as
> > > selectFields1Count,
> > >                              SUM(selectedFields2.count) as
> > > selectFields2Count,
> > >                             COUNT(rawItems) as groupCriteriaCount
> > >
> > >              }
> > >
> > > Is there a way way to do this?
> > >
> > >
> > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > you are not doing grouping followed by counting. You are doing
> grouping
> > > > followed by filtering followed by counting.
> > > > Try filtering before grouping.
> > > >
> > > > D
> > > >
> > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a pig script which does a simple GROUPing followed by
> couting
> > > and
> > > > I
> > > > > get this error.  My data is certaining not that big for it to cause
> > > this
> > > > > out of memory error. Is there a chance that this is because of some
> > > bug ?
> > > > > Did any one come across this kind of error before?
> > > > >
> > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > >
> > > > > My script:
> > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > >
> > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > > >                     GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                              SUM(selectedFields.count) as count
> > > > >
> > > > >              }
> > > > >
> > > > > Stack Trace:
> > > > >
> > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> (main):
> > > > Error
> > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > exceeded
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB