Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Java heap error


Copy link to this message
-
Re: Java heap error
Hi Syed,

Do you mean your query fails with OOME if you use Pig's builtin SUM,
but succeeds if you use your own SUM UDF? If that is so, thats
interesting.  I have a hunch, why that is the case, but would like to
confirm. Would you mind sharing your SUM UDF.

Ashutosh
On Fri, Jul 9, 2010 at 12:50, Syed Wasti <[EMAIL PROTECTED]> wrote:
> Hi Ashutosh,
> Did not try option 2 and 3, I shall work sometime next week on that.
> But increasing the heap size did not help initially, with the increased heap
> size I came up with a UDF to do the SUM on the grouped data for the last
> step in my script and it completes my query without any errors now.
>
> Syed
>
>
> On 7/8/10 5:58 PM, "Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote:
>
>> Aah.. forgot to tell how to set that param  in 3). While launching
>> pig, provide it as -D cmd line switch, as follows:
>> pig -Dpig.cachedbag.memusage=0.02f myscript.pig
>>
>> On Thu, Jul 8, 2010 at 17:45, Ashutosh Chauhan
>> <[EMAIL PROTECTED]> wrote:
>>> I will recommend following things in the order:
>>>
>>> 1) Increasing heap size should help.
>>> 2) It seems you are on 0.7. There are couple of memory fixes we have
>>> committed both on 0.7 branch as well as on trunk. Those should help as
>>> well. So, build Pig either from trunk or 0.7 branch and use that.
>>> 3) Only if these dont help, you can try tuning the param
>>> pig.cachedbag.memusage. By default, it is set at 0.1, lowering it
>>> should help. Try with 0.05, 0.02 and then further down. Downside is,
>>> as you go lower and lower, it will make your query go slower.
>>>
>>> Let us know if these changes get your query to completion.
>>>
>>> Ashutosh
>>>
>>> On Thu, Jul 8, 2010 at 15:48, Syed Wasti <[EMAIL PROTECTED]> wrote:
>>>> Thanks Ashutosh, is there any workaround for this, will increasing the heap
>>>> size help ?
>>>>
>>>>
>>>> On 7/8/10 1:59 PM, "Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Syed,
>>>>>
>>>>> You are likely hit by https://issues.apache.org/jira/browse/PIG-1442 .
>>>>> Your query and stacktrace look very similar to the one in the jira
>>>>> ticket. This may get fixed by 0.8 release.
>>>>>
>>>>> Ashutosh
>>>>>
>>>>> On Thu, Jul 8, 2010 at 13:42, Syed Wasti <[EMAIL PROTECTED]> wrote:
>>>>>> Sorry about the delay, was held with different things.
>>>>>> Here is the script and the errors below;
>>>>>>
>>>>>> AA = LOAD 'table1' USING PigStorage('\t') as
>>>>>> (ID,b,c,d,e,f,g,h,i,j,k,l,m,n,o);
>>>>>>
>>>>>> AB = FOREACH AA GENERATE ID, e, f, n,o;
>>>>>>
>>>>>> AC = FILTER AB BY o == 1;
>>>>>>
>>>>>> AD = GROUP AC BY (ID, b);
>>>>>>
>>>>>> AE = FOREACH AD { A = DISTINCT AC.d;
>>>>>>        GENERATE group.ID, (chararray) 'S' AS type, group.b, (int)
>>>>>> COUNT_STAR(filt) AS cnt, (int) COUNT(A) AS cnt_distinct; }
>>>>>>
>>>>>> The same steps are repeated to load 5 different tables and then a UNION is
>>>>>> done on them.
>>>>>>
>>>>>> Final_res = UNION AE, AF, AG, AH, AI;
>>>>>>
>>>>>> The actual number of columns will be 15 here I am showing with one table.
>>>>>>
>>>>>> Final_table =   FOREACH Final_res GENERATE ID,
>>>>>>                (type == 'S' AND b == 1?cnt:0) AS 12_tmp,
>>>>>>                (type == 'S' AND b == 2?cnt:0) AS 13_tmp,
>>>>>>                (type == 'S' AND b == 1?cnt_distinct:0) AS 12_distinct_tmp,
>>>>>>                (type == 'S' AND b == 2?cnt_distinct:0) AS 13_distinct_tmp;
>>>>>>
>>>>>> It works fine until here, it is only after adding this last part of the
>>>>>> query it starts throwing heap errors.
>>>>>>
>>>>>> grp_id =    GROUP Final_table BY ID;
>>>>>>
>>>>>> Final_data = FOREACH grp_reg_id GENERATE group AS ID
>>>>>> SUM(Final_table.12_tmp), SUM(Final_table.13_tmp),
>>>>>> SUM(Final_table.12_distinct_tmp), SUM(Final_table.13_distinct_tmp);
>>>>>>
>>>>>> STORE Final_data;
>>>>>>
>>>>>>
>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>>>>  at java.util.ArrayList.(ArrayList.java:112)
>>>>>>  at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:63)