Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
This makes more sense, grouping and filter are on different columns. I will
open a JIRA soon.

What version of Pig and Hadoop are you using?

Thanks,
Prashant

On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <[EMAIL PROTECTED]> wrote:

> Hi Prashant,
>
> Here is my script in full.
>
>
> raw = LOAD 'input' using MyCustomLoader();
>
> searches = FOREACH raw GENERATE
>                day, searchType,
>                FLATTEN(impBag) AS (adType, clickCount)
>            ;
>
> groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> counts = FOREACH groupedSearches{
>                type1 = FILTER searches BY adType == 'type1';
>                type2 = FILTER searches BY adType == 'type2';
>                GENERATE
>                    FLATTEN(group) AS (day, searchType),
>                    COUNT(searches) numSearches,
>                    SUM(clickCount) AS clickCountPerSearchType,
>                    SUM(type1.clickCount) AS type1ClickCount,
>                    SUM(type2.clickCount) AS type2ClickCount;
>        }
> ;
>
> As you can see above, I am counting the counts by the day and search type
> in clickCountPerSearchType and for each of them i need the counts broken by
> the ad type.
>
> Thanks for your help!
> Thanks,
> Rohini
>
>
> On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> <[EMAIL PROTECTED]>wrote:
>
> > Hi Rohini,
> >
> > From your query it looks like you are already grouping it by TYPE, so not
> > sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION" and
> > vice-versa. Your output is already broken down by TYPE.
> >
> > Thanks,
> > Prashant
> >
> > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks for the suggestion Prashant. However, that will not work in my
> > case.
> > >
> > > If I filter before the group and include the new field in group as you
> > > suggested, I get the individual counts broken by the select field
> > > critieria. However, I want the totals also without taking the select
> > fields
> > > into account. That is why I took the approach I described in my earlier
> > > emails.
> > >
> > > Thanks
> > > Rohini
> > >
> > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Please pull your FILTER out of GROUP BY and do it earlier
> > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > > >
> > > > In this case, you could use a FILTER followed by a bincond to
> > introduce a
> > > > new field "employerOrLocation", then do a group by and include the
> new
> > > > field in the GROUP BY clause.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > My input data size is 9GB and I am using 20 machines.
> > > > >
> > > > > My grouped criteria has two cases so I want 1) counts by the
> > criteria I
> > > > > have grouped 2) counts of the two inviduals cases in each of my
> > group.
> > > > >
> > > > > So my script in detail is:
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields1 = FILTER rawItems  BY
> > > > type="EMPLOYER";
> > > > >                   selectedFields2 = FILTER rawItems  BY
> > > type="LOCATION";
> > > > >                      GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                               SUM(selectedFields1.count) as
> > > > > selectFields1Count,
> > > > >                              SUM(selectedFields2.count) as
> > > > > selectFields2Count,
> > > > >                             COUNT(rawItems) as groupCriteriaCount
> > > > >
> > > > >              }
> > > > >
> > > > > Is there a way way to do this?
> > > > >
> > > > >
> > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > you are not doing grouping followed by counting. You are doing
> > > grouping
> > > > > > followed by filtering followed by counting.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB