Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF problem: Java Heap space


+
Aniket Mokashi 2011-02-24, 03:49
+
Dmitriy Ryaboy 2011-02-24, 03:56
+
Jai Krishna 2011-02-24, 08:58
+
Aniket Mokashi 2011-02-24, 23:49
+
Dmitriy Ryaboy 2011-02-25, 00:13
+
Daniel Dai 2011-02-25, 00:25
+
Aniket Mokashi 2011-02-25, 00:47
Copy link to this message
-
Re: UDF problem: Java Heap space
Thanks everyone for helping me out, I figured it was one of those logical
errors which lead to infinite loops. Actually indexof operation doesnt
always return -1 on failure which was causing this to get into infinite
loop (I should have thought about this). (ie. indexof('[', 187) would
return 187 and the loop would continue always.
Thanks again,
Aniket

On Thu, February 24, 2011 7:47 pm, Aniket Mokashi wrote:
> This is a map side udf.
> pig script loads a log file and grabs contents inside angle brackets. a > load; b = foreach a generate F(a); dump b;
>
> I see following on tasktrackers-
> 2011-02-23 18:01:25,992 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call
>  - Collection threshold init = 5439488(5312K) used = 409337824(399743K)
> committed = 534118400(521600K) max = 715849728(699072K) 2011-02-23
> 18:01:26,102 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
> call- Usage threshold init = 5439488(5312K) used = 546751088(533936K)
> committed = 671547392(655808K) max = 715849728(699072K)
>
> I am trying out some changes in udf to see if they work.
>
>
> Thanks,
> Aniket
>
>
> On Thu, February 24, 2011 7:25 pm, Daniel Dai wrote:
>
>> Hi, Aniket,
>> What is your Pig script? Is the UDF in map side or reduce side?
>>
>>
>>
>> Daniel
>>
>>
>>
>> Dmitriy Ryaboy wrote:
>>
>>
>>> That's a max of 3.3K single-character strings. Even with the java
>>> overhead that shouldn't be more than a meg right? none of these should
>>>  make it out of young gen assuming the list "cats" doesn't stick
>>> around outside the udf.
>>>
>>> On Thu, Feb 24, 2011 at 3:49 PM, Aniket Mokashi
>>> <[EMAIL PROTECTED]>wrote:
>>>
>>>
>>>
>>>
>>>> Hi Jai,
>>>>
>>>>
>>>>
>>>> Thanks for your email. I suspect that its the Strings in tight loop
>>>>  reason as you have suggested. I have a loop in my udf that does
>>>> the following.
>>>>
>>>> while((startInd = someLog.indexOf('[',startInd)) > 0) { endInd >>>> someLog.indexOf(']', startInd); if(endInd > 0) { category >>>> someLog.substring(startInd, endInd+1); cats.add(category); }
>>>> startInd = endInd; }
>>>>
>>>>
>>>> My jobs are failing in both local and mr mode. UDF works fine for a
>>>>  smaller input (a few lines). Also, I checked that sizeof someLog
>>>> doesnt exceed a 10000.
>>>>
>>>> Thanks,
>>>> Aniket
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, February 24, 2011 3:58 am, Jai Krishna wrote:
>>>>
>>>>
>>>>
>>>>> Sharing the code would be useful as mentioned. Also of help would
>>>>>  the heap settings that the JVM had.
>>>>>
>>>>> However, off the top of my head, one common situation (esp. in
>>>>> text processing/tokenizing) is instantiating Strings in a tight
>>>>> loop.
>>>>>
>>>>> Besides you could also exercise your UDF in a local JVM and take
>>>>> a heap dump / profile it. If your heap is less than 512M, you
>>>>> could use basic profiling via hprof/hat (see
>>>>> http://java.sun.com/developer/technicalArticles/Programming/HPROF
>>>>> .h
>>>>> tml).
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Jai
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/24/11 9:26 AM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Aniket, share the code?
>>>>> It really depends on how you create them.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -D
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 23, 2011 at 7:49 PM, Aniket Mokashi
>>>>> <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I ve written a simple UDF that parses a chararray (which looks
>>>>>> like ...[a].....[b]...[a]...) to capture stuff inside brackets
>>>>>> and return them as String a=2;b=1; and so on. The input
>>>>>> chararray are rarely more than 1000 characters and are not more
>>>>>> than 100000 (I ve added log.warn in my udf to ensure this). But,
>>>>>> I still see java
>>>>>> heap error while running this udf (even in local mode, the job
>>>>>> simply fails). My assumption is maps and lists that I use
>>>>>> locally will be recollected by gc. Am I missing something?
>>>>>>
>>>>>> Thanks,