Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Pig 0.10.0 slow startup


+
Prashant Kommireddi 2012-08-07, 22:44
+
Jonathan Coveney 2012-08-08, 05:07
+
Chun Yang 2012-08-08, 19:01
Copy link to this message
-
Re: Pig 0.10.0 slow startup
Jonathan Coveney 2012-08-08, 19:22
Thanks for this info. Can you go ahead and paste the whole GFV class?

Thanks

2012/8/8 Chun Yang <[EMAIL PROTECTED]>

> Thanks Jonathan,
>
> I've tried to produce an example script which exhibits the slowdown and
> posted it on Pastebin: http://pastebin.com/kTSsDUr3
>
> The slowdown seems to occur when we are using a lot of UDFs to parse our
> input data. Variant A in the script is noticeably slower than variant B in
> Pig 0.10 while performance is similar in Pig 0.9.1
>
> I've pasted the exec() function of the GFV function on Pastebin as well:
> http://pastebin.com/FVnkQCJ5
>
> Please let us know if you need more details.
>
> Thanks,
> Chun
>
> On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
>
> > Can you guys give a script that has the issue? My tactic would be to use
> > some sort of profiler (we have access to YourKit for open source Pig
> > contribution work) and try and isolate what is triggering GC.
> >
> > 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]>
> >
> >> Hi All,
> >>
> >> Just wanted to follow-up on Chun's question. Several of our Pig users
> have
> >> been experiencing slow start-ups with Pig 0.10.0, when the same script
> runs
> >> fine with 0.9.1. Anyone else facing similar issues?
> >>
> >> Thanks,
> >> Prashant
> >>
> >> Hi all,
> >>
> >> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the
> >> same
> >> script using the two Pig versions, 0.9.1 starts off fast and almost
> >> immediately submits the job to the cluster. On the other hand, Pig
> 0.10.0
> >> takes forever to submit the job. When I use the java option
> >> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
> times
> >> before and after the job is submitted to the cluster.
> >>
> >> Does anyone know what is causing this and/or how I might be able to
> >> troubleshoot it?
> >>
> >> I've uploaded truncated output showing when GC happens to
> >> Pastebin:http://pastebin.com/B8WTHW9r
> >>
> >> Thanks,
> >> Chun
> >>
>
>
+
Chun Yang 2012-08-08, 22:04
+
Jonathan Coveney 2012-08-09, 00:38
+
Chun Yang 2012-08-09, 00:51
+
Jonathan Coveney 2012-08-09, 18:00
+
Chun Yang 2012-08-09, 22:32
+
Prashant Kommireddi 2012-08-10, 20:15
+
Dmitriy Ryaboy 2012-08-13, 23:44
+
Chun Yang 2012-07-26, 22:32