Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded


Copy link to this message
-
Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
Rohini, it's fine even if you could reply with the stacktrace here. I can
add it to JIRA.

Thanks,
Prashant

On Thu, Mar 22, 2012 at 7:10 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> Rohini,
>
> Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610
>
> Can you please post the stacktrace as a comment to it?
>
> Thanks,
> Prashant
>
>
> On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
>
>> Rohini,
>>
>> In the meantime, something like the following should work:
>>
>> aw = LOAD 'input' using MyCustomLoader();
>>
>> searches = FOREACH raw GENERATE
>>               day, searchType,
>>               FLATTEN(impBag) AS (adType, clickCount)
>>           ;
>>
>> searches_2 = foreach searches generate *, ( adType == 'type1' ? clickCount
>> : 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
>> type2_clickCount;
>>
>> groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
>> counts = FOREACH groupedSearches{
>>                GENERATE
>>                   FLATTEN(group) AS (day, searchType),
>>                   COUNT(searches) numSearches,
>>                   SUM(clickCount) AS clickCountPerSearchType,
>>                    SUM(searches_2. type1_clickCount) AS type1ClickCount,
>>                   SUM(searches_2. type2_clickCount) AS type2ClickCount;
>>       }
>> ;
>>
>> 2012/3/22 Rohini U <[EMAIL PROTECTED]>
>>
>> > Thanks Prashant,
>> > I am using Pig 0.9.1 and hadoop 0.20.205
>> >
>> > Thanks,
>> > Rohini
>> >
>> > On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <
>> [EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > This makes more sense, grouping and filter are on different columns. I
>> > will
>> > > open a JIRA soon.
>> > >
>> > > What version of Pig and Hadoop are you using?
>> > >
>> > > Thanks,
>> > > Prashant
>> > >
>> > > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <[EMAIL PROTECTED]> wrote:
>> > >
>> > > > Hi Prashant,
>> > > >
>> > > > Here is my script in full.
>> > > >
>> > > >
>> > > > raw = LOAD 'input' using MyCustomLoader();
>> > > >
>> > > > searches = FOREACH raw GENERATE
>> > > >                day, searchType,
>> > > >                FLATTEN(impBag) AS (adType, clickCount)
>> > > >            ;
>> > > >
>> > > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
>> > > > counts = FOREACH groupedSearches{
>> > > >                type1 = FILTER searches BY adType == 'type1';
>> > > >                type2 = FILTER searches BY adType == 'type2';
>> > > >                GENERATE
>> > > >                    FLATTEN(group) AS (day, searchType),
>> > > >                    COUNT(searches) numSearches,
>> > > >                    SUM(clickCount) AS clickCountPerSearchType,
>> > > >                    SUM(type1.clickCount) AS type1ClickCount,
>> > > >                    SUM(type2.clickCount) AS type2ClickCount;
>> > > >        }
>> > > > ;
>> > > >
>> > > > As you can see above, I am counting the counts by the day and search
>> > type
>> > > > in clickCountPerSearchType and for each of them i need the counts
>> > broken
>> > > by
>> > > > the ad type.
>> > > >
>> > > > Thanks for your help!
>> > > > Thanks,
>> > > > Rohini
>> > > >
>> > > >
>> > > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
>> > > > <[EMAIL PROTECTED]>wrote:
>> > > >
>> > > > > Hi Rohini,
>> > > > >
>> > > > > From your query it looks like you are already grouping it by
>> TYPE, so
>> > > not
>> > > > > sure why you would want the SUM of, say "EMPLOYER" type in
>> "LOCATION"
>> > > and
>> > > > > vice-versa. Your output is already broken down by TYPE.
>> > > > >
>> > > > > Thanks,
>> > > > > Prashant
>> > > > >
>> > > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <[EMAIL PROTECTED]>
>> > wrote:
>> > > > >
>> > > > > > Thanks for the suggestion Prashant. However, that will not work
>> in
>> > my
>> > > > > case.
>> > > > > >
>> > > > > > If I filter before the group and include the new field in group
>> as
>> > > you
>> > > > > > suggested, I get the individual counts broken by the select
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB