Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
Prashant Kommireddi 2012-03-23, 02:10
Rohini,

Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610

Can you please post the stacktrace as a comment to it?

Thanks,
Prashant

On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> Rohini,
>
> In the meantime, something like the following should work:
>
> aw = LOAD 'input' using MyCustomLoader();
>
> searches = FOREACH raw GENERATE
>               day, searchType,
>               FLATTEN(impBag) AS (adType, clickCount)
>           ;
>
> searches_2 = foreach searches generate *, ( adType == 'type1' ? clickCount
> : 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
> type2_clickCount;
>
> groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
> counts = FOREACH groupedSearches{
>                GENERATE
>                   FLATTEN(group) AS (day, searchType),
>                   COUNT(searches) numSearches,
>                   SUM(clickCount) AS clickCountPerSearchType,
>                    SUM(searches_2. type1_clickCount) AS type1ClickCount,
>                   SUM(searches_2. type2_clickCount) AS type2ClickCount;
>       }
> ;
>
> 2012/3/22 Rohini U <[EMAIL PROTECTED]>
>
> > Thanks Prashant,
> > I am using Pig 0.9.1 and hadoop 0.20.205
> >
> > Thanks,
> > Rohini
> >
> > On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > This makes more sense, grouping and filter are on different columns. I
> > will
> > > open a JIRA soon.
> > >
> > > What version of Pig and Hadoop are you using?
> > >
> > > Thanks,
> > > Prashant
> > >
> > > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hi Prashant,
> > > >
> > > > Here is my script in full.
> > > >
> > > >
> > > > raw = LOAD 'input' using MyCustomLoader();
> > > >
> > > > searches = FOREACH raw GENERATE
> > > >                day, searchType,
> > > >                FLATTEN(impBag) AS (adType, clickCount)
> > > >            ;
> > > >
> > > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> > > > counts = FOREACH groupedSearches{
> > > >                type1 = FILTER searches BY adType == 'type1';
> > > >                type2 = FILTER searches BY adType == 'type2';
> > > >                GENERATE
> > > >                    FLATTEN(group) AS (day, searchType),
> > > >                    COUNT(searches) numSearches,
> > > >                    SUM(clickCount) AS clickCountPerSearchType,
> > > >                    SUM(type1.clickCount) AS type1ClickCount,
> > > >                    SUM(type2.clickCount) AS type2ClickCount;
> > > >        }
> > > > ;
> > > >
> > > > As you can see above, I am counting the counts by the day and search
> > type
> > > > in clickCountPerSearchType and for each of them i need the counts
> > broken
> > > by
> > > > the ad type.
> > > >
> > > > Thanks for your help!
> > > > Thanks,
> > > > Rohini
> > > >
> > > >
> > > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > > > Hi Rohini,
> > > > >
> > > > > From your query it looks like you are already grouping it by TYPE,
> so
> > > not
> > > > > sure why you would want the SUM of, say "EMPLOYER" type in
> "LOCATION"
> > > and
> > > > > vice-versa. Your output is already broken down by TYPE.
> > > > >
> > > > > Thanks,
> > > > > Prashant
> > > > >
> > > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <[EMAIL PROTECTED]>
> > wrote:
> > > > >
> > > > > > Thanks for the suggestion Prashant. However, that will not work
> in
> > my
> > > > > case.
> > > > > >
> > > > > > If I filter before the group and include the new field in group
> as
> > > you
> > > > > > suggested, I get the individual counts broken by the select field
> > > > > > critieria. However, I want the totals also without taking the
> > select
> > > > > fields
> > > > > > into account. That is why I took the approach I described in my
> > > earlier
> > > > > > emails.
> > > > > >
> > > > > > Thanks
> > > > > > Rohini
> > > > > >
> > > > > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <