Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
Prashant Kommireddi 2012-03-22, 20:27
This makes more sense, grouping and filter are on different columns. I will
open a JIRA soon.

What version of Pig and Hadoop are you using?

Thanks,
Prashant

On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <[EMAIL PROTECTED]> wrote:

> Hi Prashant,
>
> Here is my script in full.
>
>
> raw = LOAD 'input' using MyCustomLoader();
>
> searches = FOREACH raw GENERATE
>                day, searchType,
>                FLATTEN(impBag) AS (adType, clickCount)
>            ;
>
> groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> counts = FOREACH groupedSearches{
>                type1 = FILTER searches BY adType == 'type1';
>                type2 = FILTER searches BY adType == 'type2';
>                GENERATE
>                    FLATTEN(group) AS (day, searchType),
>                    COUNT(searches) numSearches,
>                    SUM(clickCount) AS clickCountPerSearchType,
>                    SUM(type1.clickCount) AS type1ClickCount,
>                    SUM(type2.clickCount) AS type2ClickCount;
>        }
> ;
>
> As you can see above, I am counting the counts by the day and search type
> in clickCountPerSearchType and for each of them i need the counts broken by
> the ad type.
>
> Thanks for your help!
> Thanks,
> Rohini
>
>
> On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> <[EMAIL PROTECTED]>wrote:
>
> > Hi Rohini,
> >
> > From your query it looks like you are already grouping it by TYPE, so not
> > sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION" and
> > vice-versa. Your output is already broken down by TYPE.
> >
> > Thanks,
> > Prashant
> >
> > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks for the suggestion Prashant. However, that will not work in my
> > case.
> > >
> > > If I filter before the group and include the new field in group as you
> > > suggested, I get the individual counts broken by the select field
> > > critieria. However, I want the totals also without taking the select
> > fields
> > > into account. That is why I took the approach I described in my earlier
> > > emails.
> > >
> > > Thanks
> > > Rohini
> > >
> > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Please pull your FILTER out of GROUP BY and do it earlier
> > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > > >
> > > > In this case, you could use a FILTER followed by a bincond to
> > introduce a
> > > > new field "employerOrLocation", then do a group by and include the
> new
> > > > field in the GROUP BY clause.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > My input data size is 9GB and I am using 20 machines.
> > > > >
> > > > > My grouped criteria has two cases so I want 1) counts by the
> > criteria I
> > > > > have grouped 2) counts of the two inviduals cases in each of my
> > group.
> > > > >
> > > > > So my script in detail is:
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields1 = FILTER rawItems  BY
> > > > type="EMPLOYER";
> > > > >                   selectedFields2 = FILTER rawItems  BY
> > > type="LOCATION";
> > > > >                      GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                               SUM(selectedFields1.count) as
> > > > > selectFields1Count,
> > > > >                              SUM(selectedFields2.count) as
> > > > > selectFields2Count,
> > > > >                             COUNT(rawItems) as groupCriteriaCount
> > > > >
> > > > >              }
> > > > >
> > > > > Is there a way way to do this?
> > > > >
> > > > >
> > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > you are not doing grouping followed by counting. You are doing
> > > grouping
> > > > > > followed by filtering followed by counting.