Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Java heap error


Copy link to this message
-
Re: Java heap error
Aah.. forgot to tell how to set that param  in 3). While launching
pig, provide it as -D cmd line switch, as follows:
pig -Dpig.cachedbag.memusage=0.02f myscript.pig

On Thu, Jul 8, 2010 at 17:45, Ashutosh Chauhan
<[EMAIL PROTECTED]> wrote:
> I will recommend following things in the order:
>
> 1) Increasing heap size should help.
> 2) It seems you are on 0.7. There are couple of memory fixes we have
> committed both on 0.7 branch as well as on trunk. Those should help as
> well. So, build Pig either from trunk or 0.7 branch and use that.
> 3) Only if these dont help, you can try tuning the param
> pig.cachedbag.memusage. By default, it is set at 0.1, lowering it
> should help. Try with 0.05, 0.02 and then further down. Downside is,
> as you go lower and lower, it will make your query go slower.
>
> Let us know if these changes get your query to completion.
>
> Ashutosh
>
> On Thu, Jul 8, 2010 at 15:48, Syed Wasti <[EMAIL PROTECTED]> wrote:
>> Thanks Ashutosh, is there any workaround for this, will increasing the heap
>> size help ?
>>
>>
>> On 7/8/10 1:59 PM, "Ashutosh Chauhan" <[EMAIL PROTECTED]> wrote:
>>
>>> Syed,
>>>
>>> You are likely hit by https://issues.apache.org/jira/browse/PIG-1442 .
>>> Your query and stacktrace look very similar to the one in the jira
>>> ticket. This may get fixed by 0.8 release.
>>>
>>> Ashutosh
>>>
>>> On Thu, Jul 8, 2010 at 13:42, Syed Wasti <[EMAIL PROTECTED]> wrote:
>>>> Sorry about the delay, was held with different things.
>>>> Here is the script and the errors below;
>>>>
>>>> AA = LOAD 'table1' USING PigStorage('\t') as
>>>> (ID,b,c,d,e,f,g,h,i,j,k,l,m,n,o);
>>>>
>>>> AB = FOREACH AA GENERATE ID, e, f, n,o;
>>>>
>>>> AC = FILTER AB BY o == 1;
>>>>
>>>> AD = GROUP AC BY (ID, b);
>>>>
>>>> AE = FOREACH AD { A = DISTINCT AC.d;
>>>>        GENERATE group.ID, (chararray) 'S' AS type, group.b, (int)
>>>> COUNT_STAR(filt) AS cnt, (int) COUNT(A) AS cnt_distinct; }
>>>>
>>>> The same steps are repeated to load 5 different tables and then a UNION is
>>>> done on them.
>>>>
>>>> Final_res = UNION AE, AF, AG, AH, AI;
>>>>
>>>> The actual number of columns will be 15 here I am showing with one table.
>>>>
>>>> Final_table =   FOREACH Final_res GENERATE ID,
>>>>                (type == 'S' AND b == 1?cnt:0) AS 12_tmp,
>>>>                (type == 'S' AND b == 2?cnt:0) AS 13_tmp,
>>>>                (type == 'S' AND b == 1?cnt_distinct:0) AS 12_distinct_tmp,
>>>>                (type == 'S' AND b == 2?cnt_distinct:0) AS 13_distinct_tmp;
>>>>
>>>> It works fine until here, it is only after adding this last part of the
>>>> query it starts throwing heap errors.
>>>>
>>>> grp_id =    GROUP Final_table BY ID;
>>>>
>>>> Final_data = FOREACH grp_reg_id GENERATE group AS ID
>>>> SUM(Final_table.12_tmp), SUM(Final_table.13_tmp),
>>>> SUM(Final_table.12_distinct_tmp), SUM(Final_table.13_distinct_tmp);
>>>>
>>>> STORE Final_data;
>>>>
>>>>
>>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>>  at java.util.ArrayList.(ArrayList.java:112)
>>>>  at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:63)
>>>>  at
>>>> org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35
>>>> )
>>>>  at
>>>> org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:55)
>>>>  at
>>>> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
>>>>  at
>>>> org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:130)
>>>>  at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:289)
>>>>  at
>>>> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.ja
>>>> va:114)
>>>>  at
>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
>>>> eserialize(WritableSerialization.java:67)
>>>>  at
>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
>>>> eserialize(WritableSerialization.java:40)
>>>>  at
>>>> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:11