Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig 0.10.0 slow startup


+
Prashant Kommireddi 2012-08-07, 22:44
+
Jonathan Coveney 2012-08-08, 05:07
+
Chun Yang 2012-08-08, 19:01
+
Jonathan Coveney 2012-08-08, 19:22
+
Chun Yang 2012-08-08, 22:04
Copy link to this message
-
Re: Pig 0.10.0 slow startup
Thanks for putting that together, Chun.

So, it looks like there are ~400 instantiations of the class, and the time
from the first instantiation to the last one is about ~1.5s. Is that on the
order of the slowdown your experiencing?

(note: I'm testing with Pig 11...if your slowdown is much higher than that,
I'll test on Pig 10)

Either way, it seems like the slowdown is directly attributable to UDF
invocations. Have you seen slowdowns much larger than this?

2012/8/8 Chun Yang <[EMAIL PROTECTED]>

> Hi Jonathan,
>
> Here is a more self-contained example than what I had before:
> http://ews.illinois.edu/~yang43/shared/students.tar.gz
>
> I wrote a trivial GFV class, but the slowdown still exists.
> students-a.pig starts up noticeably slower than students-b.pig .
>
> Thanks,
> Chun
>
> On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
>
> > Thanks for this info. Can you go ahead and paste the whole GFV class?
> >
> > Thanks
> >
> > 2012/8/8 Chun Yang <[EMAIL PROTECTED]>
> >
> >> Thanks Jonathan,
> >>
> >> I've tried to produce an example script which exhibits the slowdown and
> >> posted it on Pastebin: http://pastebin.com/kTSsDUr3
> >>
> >> The slowdown seems to occur when we are using a lot of UDFs to parse our
> >> input data. Variant A in the script is noticeably slower than variant B
> in
> >> Pig 0.10 while performance is similar in Pig 0.9.1
> >>
> >> I've pasted the exec() function of the GFV function on Pastebin as well:
> >> http://pastebin.com/FVnkQCJ5
> >>
> >> Please let us know if you need more details.
> >>
> >> Thanks,
> >> Chun
> >>
> >> On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Can you guys give a script that has the issue? My tactic would be to
> use
> >>> some sort of profiler (we have access to YourKit for open source Pig
> >>> contribution work) and try and isolate what is triggering GC.
> >>>
> >>> 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]>
> >>>
> >>>> Hi All,
> >>>>
> >>>> Just wanted to follow-up on Chun's question. Several of our Pig users
> >> have
> >>>> been experiencing slow start-ups with Pig 0.10.0, when the same script
> >> runs
> >>>> fine with 0.9.1. Anyone else facing similar issues?
> >>>>
> >>>> Thanks,
> >>>> Prashant
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run
> the
> >>>> same
> >>>> script using the two Pig versions, 0.9.1 starts off fast and almost
> >>>> immediately submits the job to the cluster. On the other hand, Pig
> >> 0.10.0
> >>>> takes forever to submit the job. When I use the java option
> >>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
> >> times
> >>>> before and after the job is submitted to the cluster.
> >>>>
> >>>> Does anyone know what is causing this and/or how I might be able to
> >>>> troubleshoot it?
> >>>>
> >>>> I've uploaded truncated output showing when GC happens to
> >>>> Pastebin:http://pastebin.com/B8WTHW9r
> >>>>
> >>>> Thanks,
> >>>> Chun
> >>>>
> >>
> >>
>
>
+
Chun Yang 2012-08-09, 00:51
+
Jonathan Coveney 2012-08-09, 18:00
+
Chun Yang 2012-08-09, 22:32
+
Prashant Kommireddi 2012-08-10, 20:15
+
Dmitriy Ryaboy 2012-08-13, 23:44
+
Chun Yang 2012-07-26, 22:32