Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - PigServer memory leak


Copy link to this message
-
Re: PigServer memory leak
Ashutosh Chauhan 2010-03-10, 05:13
PigServer maintains some static state and in current implementation it
is not safe to reuse it across different queries. You should create a
new instance for every query.

As for memory leak : Are you running exactly same query over same
dataset repeatedly ? If yes and you run out of memory then there is
memory leak somewhere. But, I doubt thats what you are doing. More
likely, the message you are seeing has nothing to do with PigServer
and is because of the query and/or dataset. That is your query may not
be taking advantages of optimizations in pig. When you see that
message, run same query using grunt or bin/pig and you should see same
messages. Send the query you are firing, there might be way to
optimize it and to avoid those messages.

Hope that helps,
Ashutosh

On Tue, Mar 9, 2010 at 11:42, Bill Graham <[EMAIL PROTECTED]> wrote:
> Actually, upon closer investigation, re-using PigServer isn't working as
> well as I thought. I'm digging into the issue.
>
> To step back a bit though, I want to pose a different question: What is the
> intended usage of PigServer and PigContext w.r.t. it's scope? Should a new
> instance of each be used for every job or is one or the other intended for
> re-use throughout the lifecycle of the VM instance?
>
> Digging into the code of PigServer it seems like it's intended to be used
> for a single script's execution only, but it's not entirely clear if that's
> the case.
>
>
>
> On Tue, Mar 9, 2010 at 9:29 AM, Bill Graham <[EMAIL PROTECTED]> wrote:
>
>> hi,
>>
>> I've got a long running daemon application that periodically kicks of Pig
>> jobs via quartz (Pig version 0.4.0). It uses a wrapper class that initilizes
>> an instance of PigServer before parsing and executing a pig script. As
>> implemented, the app would leak memory and after a while jobs would fail to
>> run with messages like this appearing in the logs:
>>
>> [Low Memory Detector] [INFO] SpillableMemoryManager.java:143 low memory
>> handler called
>>
>> To fix the issue, I created an instance of PigServer at application
>> initialization and I re-use that instance for all jobs for the life of the
>> daemon. Problem solved.
>>
>> So my question is, is this a bug in PigServer that it leaks memory when
>> multiple instances are created, or is that just improper use of the class?
>>
>> thanks,
>> Bill
>>
>