Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - UDF problem: Java Heap space


Copy link to this message
-
Re: UDF problem: Java Heap space
Aniket Mokashi 2011-02-24, 23:49
Hi Jai,

Thanks for your email. I suspect that its the Strings in tight loop reason
as you have suggested. I have a loop in my udf that does the following.

while((startInd = someLog.indexOf('[',startInd)) > 0) {
endInd = someLog.indexOf(']', startInd);
if(endInd > 0) {
category = someLog.substring(startInd, endInd+1);
cats.add(category);
}
startInd = endInd;
}

My jobs are failing in both local and mr mode. UDF works fine for a
smaller input (a few lines). Also, I checked that sizeof someLog doesnt
exceed a 10000.

Thanks,
Aniket
On Thu, February 24, 2011 3:58 am, Jai Krishna wrote:
> Sharing the code would be useful as mentioned. Also of help would the
> heap settings that the JVM had.
>
> However, off the top of my head, one common situation (esp. in text
> processing/tokenizing) is instantiating Strings in a tight loop.
>
> Besides you could also exercise your UDF in a local JVM and take a heap
> dump / profile it. If your heap is less than 512M, you could use basic
> profiling via hprof/hat (see
> http://java.sun.com/developer/technicalArticles/Programming/HPROF.html ).
>
>
> Thanks,
> Jai
>
>
>
> On 2/24/11 9:26 AM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:
>
>
> Aniket, share the code?
> It really depends on how you create them.
>
>
> -D
>
>
> On Wed, Feb 23, 2011 at 7:49 PM, Aniket Mokashi
> <[EMAIL PROTECTED]>wrote:
>
>
>> I ve written a simple UDF that parses a chararray (which looks like
>> ...[a].....[b]...[a]...) to capture stuff inside brackets and return
>> them as String a=2;b=1; and so on. The input chararray are rarely more
>> than 1000 characters and are not more than 100000 (I ve added log.warn
>> in my udf to ensure this). But, I still see java heap error while
>> running this udf (even in local mode, the job simply fails). My
>> assumption is maps and lists that I use locally will be recollected by
>> gc. Am I missing something?
>>
>> Thanks,
>> Aniket
>>
>>
>>
>
>