Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Java heap error


+
Syed Wasti 2010-07-07, 21:09
+
Ashutosh Chauhan 2010-07-08, 00:50
+
Syed Wasti 2010-07-08, 20:42
+
Ashutosh Chauhan 2010-07-08, 20:59
+
Syed Wasti 2010-07-08, 22:48
+
Ashutosh Chauhan 2010-07-09, 00:45
+
Ashutosh Chauhan 2010-07-09, 00:58
+
Syed Wasti 2010-07-09, 19:50
Copy link to this message
-
Re: Java heap error
Ashutosh Chauhan 2010-07-09, 21:32
Hi Syed,

Do you mean your query fails with OOME if you use Pig's builtin SUM,
but succeeds if you use your own SUM UDF? If that is so, thats
interesting.  I have a hunch, why that is the case, but would like to
confirm. Would you mind sharing your SUM UDF.

Ashutosh
On Fri, Jul 9, 2010 at 12:50, Syed Wasti <[EMAIL PROTECTED]> wrote:
> Hi Ashutosh,
> Did not try option 2 and 3, I shall work sometime next week on that.
> But increasing the heap size did not help initially, with the increased heap
> size I came up with a UDF to do the SUM on the grouped data for the last
> step in my script and it completes my query without any errors now.
>
> Syed
>
>
> On 7/8/10 5:58 PM, "Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote:
>
>> Aah.. forgot to tell how to set that param  in 3). While launching
>> pig, provide it as -D cmd line switch, as follows:
>> pig -Dpig.cachedbag.memusage=0.02f myscript.pig
>>
>> On Thu, Jul 8, 2010 at 17:45, Ashutosh Chauhan
>> <[EMAIL PROTECTED]> wrote:
>>> I will recommend following things in the order:
>>>
>>> 1) Increasing heap size should help.
>>> 2) It seems you are on 0.7. There are couple of memory fixes we have
>>> committed both on 0.7 branch as well as on trunk. Those should help as
>>> well. So, build Pig either from trunk or 0.7 branch and use that.
>>> 3) Only if these dont help, you can try tuning the param
>>> pig.cachedbag.memusage. By default, it is set at 0.1, lowering it
>>> should help. Try with 0.05, 0.02 and then further down. Downside is,
>>> as you go lower and lower, it will make your query go slower.
>>>
>>> Let us know if these changes get your query to completion.
>>>
>>> Ashutosh
>>>
>>> On Thu, Jul 8, 2010 at 15:48, Syed Wasti <[EMAIL PROTECTED]> wrote:
>>>> Thanks Ashutosh, is there any workaround for this, will increasing the heap
>>>> size help ?
>>>>
>>>>
>>>> On 7/8/10 1:59 PM, "Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Syed,
>>>>>
>>>>> You are likely hit by https://issues.apache.org/jira/browse/PIG-1442 .
>>>>> Your query and stacktrace look very similar to the one in the jira
>>>>> ticket. This may get fixed by 0.8 release.
>>>>>
>>>>> Ashutosh
>>>>>
>>>>> On Thu, Jul 8, 2010 at 13:42, Syed Wasti <[EMAIL PROTECTED]> wrote:
>>>>>> Sorry about the delay, was held with different things.
>>>>>> Here is the script and the errors below;
>>>>>>
>>>>>> AA = LOAD 'table1' USING PigStorage('\t') as
>>>>>> (ID,b,c,d,e,f,g,h,i,j,k,l,m,n,o);
>>>>>>
>>>>>> AB = FOREACH AA GENERATE ID, e, f, n,o;
>>>>>>
>>>>>> AC = FILTER AB BY o == 1;
>>>>>>
>>>>>> AD = GROUP AC BY (ID, b);
>>>>>>
>>>>>> AE = FOREACH AD { A = DISTINCT AC.d;
>>>>>>        GENERATE group.ID, (chararray) 'S' AS type, group.b, (int)
>>>>>> COUNT_STAR(filt) AS cnt, (int) COUNT(A) AS cnt_distinct; }
>>>>>>
>>>>>> The same steps are repeated to load 5 different tables and then a UNION is
>>>>>> done on them.
>>>>>>
>>>>>> Final_res = UNION AE, AF, AG, AH, AI;
>>>>>>
>>>>>> The actual number of columns will be 15 here I am showing with one table.
>>>>>>
>>>>>> Final_table =   FOREACH Final_res GENERATE ID,
>>>>>>                (type == 'S' AND b == 1?cnt:0) AS 12_tmp,
>>>>>>                (type == 'S' AND b == 2?cnt:0) AS 13_tmp,
>>>>>>                (type == 'S' AND b == 1?cnt_distinct:0) AS 12_distinct_tmp,
>>>>>>                (type == 'S' AND b == 2?cnt_distinct:0) AS 13_distinct_tmp;
>>>>>>
>>>>>> It works fine until here, it is only after adding this last part of the
>>>>>> query it starts throwing heap errors.
>>>>>>
>>>>>> grp_id =    GROUP Final_table BY ID;
>>>>>>
>>>>>> Final_data = FOREACH grp_reg_id GENERATE group AS ID
>>>>>> SUM(Final_table.12_tmp), SUM(Final_table.13_tmp),
>>>>>> SUM(Final_table.12_distinct_tmp), SUM(Final_table.13_distinct_tmp);
>>>>>>
>>>>>> STORE Final_data;
>>>>>>
>>>>>>
>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>>>>  at java.util.ArrayList.(ArrayList.java:112)
>>>>>>  at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:63)
+
Syed Wasti 2010-07-09, 23:01
+
Thejas M Nair 2010-07-23, 20:15
+
Syed Wasti 2010-07-28, 06:27
+
Thejas M Nair 2010-07-29, 00:29
+
Syed Wasti 2010-07-29, 18:10
+
Thejas M Nair 2010-07-29, 19:38