Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
Prashant Kommireddi 2012-03-23, 19:46
Rohini, it's fine even if you could reply with the stacktrace here. I can
add it to JIRA.

Thanks,
Prashant

On Thu, Mar 22, 2012 at 7:10 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> Rohini,
>
> Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610
>
> Can you please post the stacktrace as a comment to it?
>
> Thanks,
> Prashant
>
>
> On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
>
>> Rohini,
>>
>> In the meantime, something like the following should work:
>>
>> aw = LOAD 'input' using MyCustomLoader();
>>
>> searches = FOREACH raw GENERATE
>>               day, searchType,
>>               FLATTEN(impBag) AS (adType, clickCount)
>>           ;
>>
>> searches_2 = foreach searches generate *, ( adType == 'type1' ? clickCount
>> : 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
>> type2_clickCount;
>>
>> groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
>> counts = FOREACH groupedSearches{
>>                GENERATE
>>                   FLATTEN(group) AS (day, searchType),
>>                   COUNT(searches) numSearches,
>>                   SUM(clickCount) AS clickCountPerSearchType,
>>                    SUM(searches_2. type1_clickCount) AS type1ClickCount,
>>                   SUM(searches_2. type2_clickCount) AS type2ClickCount;
>>       }
>> ;
>>
>> 2012/3/22 Rohini U <[EMAIL PROTECTED]>
>>
>> > Thanks Prashant,
>> > I am using Pig 0.9.1 and hadoop 0.20.205
>> >
>> > Thanks,
>> > Rohini
>> >
>> > On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <
>> [EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > This makes more sense, grouping and filter are on different columns. I
>> > will
>> > > open a JIRA soon.
>> > >
>> > > What version of Pig and Hadoop are you using?
>> > >
>> > > Thanks,
>> > > Prashant
>> > >
>> > > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <[EMAIL PROTECTED]> wrote:
>> > >
>> > > > Hi Prashant,
>> > > >
>> > > > Here is my script in full.
>> > > >
>> > > >
>> > > > raw = LOAD 'input' using MyCustomLoader();
>> > > >
>> > > > searches = FOREACH raw GENERATE
>> > > >                day, searchType,
>> > > >                FLATTEN(impBag) AS (adType, clickCount)
>> > > >            ;
>> > > >
>> > > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
>> > > > counts = FOREACH groupedSearches{
>> > > >                type1 = FILTER searches BY adType == 'type1';
>> > > >                type2 = FILTER searches BY adType == 'type2';
>> > > >                GENERATE
>> > > >                    FLATTEN(group) AS (day, searchType),
>> > > >                    COUNT(searches) numSearches,
>> > > >                    SUM(clickCount) AS clickCountPerSearchType,
>> > > >                    SUM(type1.clickCount) AS type1ClickCount,
>> > > >                    SUM(type2.clickCount) AS type2ClickCount;
>> > > >        }
>> > > > ;
>> > > >
>> > > > As you can see above, I am counting the counts by the day and search
>> > type
>> > > > in clickCountPerSearchType and for each of them i need the counts
>> > broken
>> > > by
>> > > > the ad type.
>> > > >
>> > > > Thanks for your help!
>> > > > Thanks,
>> > > > Rohini
>> > > >
>> > > >
>> > > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
>> > > > <[EMAIL PROTECTED]>wrote:
>> > > >
>> > > > > Hi Rohini,
>> > > > >
>> > > > > From your query it looks like you are already grouping it by
>> TYPE, so
>> > > not
>> > > > > sure why you would want the SUM of, say "EMPLOYER" type in
>> "LOCATION"
>> > > and
>> > > > > vice-versa. Your output is already broken down by TYPE.
>> > > > >
>> > > > > Thanks,
>> > > > > Prashant
>> > > > >
>> > > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <[EMAIL PROTECTED]>
>> > wrote:
>> > > > >
>> > > > > > Thanks for the suggestion Prashant. However, that will not work
>> in
>> > my
>> > > > > case.
>> > > > > >
>> > > > > > If I filter before the group and include the new field in group
>> as
>> > > you
>> > > > > > suggested, I get the individual counts broken by the select