|
|
-
Pig 0.10.0 slow startup
Prashant Kommireddi 2012-08-07, 22:44
Hi All, Just wanted to follow-up on Chun's question. Several of our Pig users have been experiencing slow start-ups with Pig 0.10.0, when the same script runs fine with 0.9.1. Anyone else facing similar issues? Thanks, Prashant Hi all, I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the same script using the two Pig versions, 0.9.1 starts off fast and almost immediately submits the job to the cluster. On the other hand, Pig 0.10.0 takes forever to submit the job. When I use the java option -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times before and after the job is submitted to the cluster. Does anyone know what is causing this and/or how I might be able to troubleshoot it? I've uploaded truncated output showing when GC happens to Pastebin: http://pastebin.com/B8WTHW9rThanks, Chun
+
Prashant Kommireddi 2012-08-07, 22:44
-
Re: Pig 0.10.0 slow startup
Jonathan Coveney 2012-08-08, 05:07
Can you guys give a script that has the issue? My tactic would be to use some sort of profiler (we have access to YourKit for open source Pig contribution work) and try and isolate what is triggering GC. 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]> > Hi All, > > Just wanted to follow-up on Chun's question. Several of our Pig users have > been experiencing slow start-ups with Pig 0.10.0, when the same script runs > fine with 0.9.1. Anyone else facing similar issues? > > Thanks, > Prashant > > Hi all, > > I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the > same > script using the two Pig versions, 0.9.1 starts off fast and almost > immediately submits the job to the cluster. On the other hand, Pig 0.10.0 > takes forever to submit the job. When I use the java option > -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times > before and after the job is submitted to the cluster. > > Does anyone know what is causing this and/or how I might be able to > troubleshoot it? > > I've uploaded truncated output showing when GC happens to > Pastebin: http://pastebin.com/B8WTHW9r> > Thanks, > Chun >
+
Jonathan Coveney 2012-08-08, 05:07
-
Re: Pig 0.10.0 slow startup
Chun Yang 2012-08-08, 19:01
Thanks Jonathan, I've tried to produce an example script which exhibits the slowdown and posted it on Pastebin: http://pastebin.com/kTSsDUr3The slowdown seems to occur when we are using a lot of UDFs to parse our input data. Variant A in the script is noticeably slower than variant B in Pig 0.10 while performance is similar in Pig 0.9.1 I've pasted the exec() function of the GFV function on Pastebin as well: http://pastebin.com/FVnkQCJ5Please let us know if you need more details. Thanks, Chun On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > Can you guys give a script that has the issue? My tactic would be to use > some sort of profiler (we have access to YourKit for open source Pig > contribution work) and try and isolate what is triggering GC. > > 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]> > >> Hi All, >> >> Just wanted to follow-up on Chun's question. Several of our Pig users have >> been experiencing slow start-ups with Pig 0.10.0, when the same script runs >> fine with 0.9.1. Anyone else facing similar issues? >> >> Thanks, >> Prashant >> >> Hi all, >> >> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the >> same >> script using the two Pig versions, 0.9.1 starts off fast and almost >> immediately submits the job to the cluster. On the other hand, Pig 0.10.0 >> takes forever to submit the job. When I use the java option >> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times >> before and after the job is submitted to the cluster. >> >> Does anyone know what is causing this and/or how I might be able to >> troubleshoot it? >> >> I've uploaded truncated output showing when GC happens to >> Pastebin: http://pastebin.com/B8WTHW9r>> >> Thanks, >> Chun >>
+
Chun Yang 2012-08-08, 19:01
-
Re: Pig 0.10.0 slow startup
Jonathan Coveney 2012-08-08, 19:22
Thanks for this info. Can you go ahead and paste the whole GFV class? Thanks 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > Thanks Jonathan, > > I've tried to produce an example script which exhibits the slowdown and > posted it on Pastebin: http://pastebin.com/kTSsDUr3> > The slowdown seems to occur when we are using a lot of UDFs to parse our > input data. Variant A in the script is noticeably slower than variant B in > Pig 0.10 while performance is similar in Pig 0.9.1 > > I've pasted the exec() function of the GFV function on Pastebin as well: > http://pastebin.com/FVnkQCJ5> > Please let us know if you need more details. > > Thanks, > Chun > > On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > > Can you guys give a script that has the issue? My tactic would be to use > > some sort of profiler (we have access to YourKit for open source Pig > > contribution work) and try and isolate what is triggering GC. > > > > 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]> > > > >> Hi All, > >> > >> Just wanted to follow-up on Chun's question. Several of our Pig users > have > >> been experiencing slow start-ups with Pig 0.10.0, when the same script > runs > >> fine with 0.9.1. Anyone else facing similar issues? > >> > >> Thanks, > >> Prashant > >> > >> Hi all, > >> > >> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the > >> same > >> script using the two Pig versions, 0.9.1 starts off fast and almost > >> immediately submits the job to the cluster. On the other hand, Pig > 0.10.0 > >> takes forever to submit the job. When I use the java option > >> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many > times > >> before and after the job is submitted to the cluster. > >> > >> Does anyone know what is causing this and/or how I might be able to > >> troubleshoot it? > >> > >> I've uploaded truncated output showing when GC happens to > >> Pastebin: http://pastebin.com/B8WTHW9r> >> > >> Thanks, > >> Chun > >> > >
+
Jonathan Coveney 2012-08-08, 19:22
-
Re: Pig 0.10.0 slow startup
Chun Yang 2012-08-08, 22:04
Hi Jonathan, Here is a more self-contained example than what I had before: http://ews.illinois.edu/~yang43/shared/students.tar.gzI wrote a trivial GFV class, but the slowdown still exists. students-a.pig starts up noticeably slower than students-b.pig . Thanks, Chun On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > Thanks for this info. Can you go ahead and paste the whole GFV class? > > Thanks > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > >> Thanks Jonathan, >> >> I've tried to produce an example script which exhibits the slowdown and >> posted it on Pastebin: http://pastebin.com/kTSsDUr3>> >> The slowdown seems to occur when we are using a lot of UDFs to parse our >> input data. Variant A in the script is noticeably slower than variant B in >> Pig 0.10 while performance is similar in Pig 0.9.1 >> >> I've pasted the exec() function of the GFV function on Pastebin as well: >> http://pastebin.com/FVnkQCJ5>> >> Please let us know if you need more details. >> >> Thanks, >> Chun >> >> On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >> >>> Can you guys give a script that has the issue? My tactic would be to use >>> some sort of profiler (we have access to YourKit for open source Pig >>> contribution work) and try and isolate what is triggering GC. >>> >>> 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]> >>> >>>> Hi All, >>>> >>>> Just wanted to follow-up on Chun's question. Several of our Pig users >> have >>>> been experiencing slow start-ups with Pig 0.10.0, when the same script >> runs >>>> fine with 0.9.1. Anyone else facing similar issues? >>>> >>>> Thanks, >>>> Prashant >>>> >>>> Hi all, >>>> >>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the >>>> same >>>> script using the two Pig versions, 0.9.1 starts off fast and almost >>>> immediately submits the job to the cluster. On the other hand, Pig >> 0.10.0 >>>> takes forever to submit the job. When I use the java option >>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many >> times >>>> before and after the job is submitted to the cluster. >>>> >>>> Does anyone know what is causing this and/or how I might be able to >>>> troubleshoot it? >>>> >>>> I've uploaded truncated output showing when GC happens to >>>> Pastebin: http://pastebin.com/B8WTHW9r>>>> >>>> Thanks, >>>> Chun >>>> >> >>
+
Chun Yang 2012-08-08, 22:04
-
Re: Pig 0.10.0 slow startup
Jonathan Coveney 2012-08-09, 00:38
Thanks for putting that together, Chun. So, it looks like there are ~400 instantiations of the class, and the time from the first instantiation to the last one is about ~1.5s. Is that on the order of the slowdown your experiencing? (note: I'm testing with Pig 11...if your slowdown is much higher than that, I'll test on Pig 10) Either way, it seems like the slowdown is directly attributable to UDF invocations. Have you seen slowdowns much larger than this? 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > Hi Jonathan, > > Here is a more self-contained example than what I had before: > http://ews.illinois.edu/~yang43/shared/students.tar.gz> > I wrote a trivial GFV class, but the slowdown still exists. > students-a.pig starts up noticeably slower than students-b.pig . > > Thanks, > Chun > > On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > > Thanks for this info. Can you go ahead and paste the whole GFV class? > > > > Thanks > > > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > > > >> Thanks Jonathan, > >> > >> I've tried to produce an example script which exhibits the slowdown and > >> posted it on Pastebin: http://pastebin.com/kTSsDUr3> >> > >> The slowdown seems to occur when we are using a lot of UDFs to parse our > >> input data. Variant A in the script is noticeably slower than variant B > in > >> Pig 0.10 while performance is similar in Pig 0.9.1 > >> > >> I've pasted the exec() function of the GFV function on Pastebin as well: > >> http://pastebin.com/FVnkQCJ5> >> > >> Please let us know if you need more details. > >> > >> Thanks, > >> Chun > >> > >> On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > >> > >>> Can you guys give a script that has the issue? My tactic would be to > use > >>> some sort of profiler (we have access to YourKit for open source Pig > >>> contribution work) and try and isolate what is triggering GC. > >>> > >>> 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]> > >>> > >>>> Hi All, > >>>> > >>>> Just wanted to follow-up on Chun's question. Several of our Pig users > >> have > >>>> been experiencing slow start-ups with Pig 0.10.0, when the same script > >> runs > >>>> fine with 0.9.1. Anyone else facing similar issues? > >>>> > >>>> Thanks, > >>>> Prashant > >>>> > >>>> Hi all, > >>>> > >>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run > the > >>>> same > >>>> script using the two Pig versions, 0.9.1 starts off fast and almost > >>>> immediately submits the job to the cluster. On the other hand, Pig > >> 0.10.0 > >>>> takes forever to submit the job. When I use the java option > >>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many > >> times > >>>> before and after the job is submitted to the cluster. > >>>> > >>>> Does anyone know what is causing this and/or how I might be able to > >>>> troubleshoot it? > >>>> > >>>> I've uploaded truncated output showing when GC happens to > >>>> Pastebin: http://pastebin.com/B8WTHW9r> >>>> > >>>> Thanks, > >>>> Chun > >>>> > >> > >> > >
+
Jonathan Coveney 2012-08-09, 00:38
-
Re: Pig 0.10.0 slow startup
Chun Yang 2012-08-09, 00:51
Thanks Jonathan, Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1: pig10 -b -e 'explain -script students-a.pig' 35.35s user 8.52s system 63% cpu 1:08.77 total pig10 -b -e 'explain -script students-b.pig' 5.32s user 0.48s system 130% cpu 4.460 total pig9 -b -e 'explain -script students-a.pig' 4.93s user 0.51s system 131% cpu 4.153 total pig9 -b -e 'explain -script students-b.pig' 3.86s user 0.41s system 131% cpu 3.254 total Seems like the first run is always slower, but subsequent runs are about the same: pig10 -b -e 'explain -script students-a.pig' 35.17s user 8.20s system 123% cpu 35.017 total pig10 -b -e 'explain -script students-a.pig' 35.41s user 8.55s system 122% cpu 35.803 total A little more than 1.5s slowdown :) Thanks, Chun On 8/8/12 5:38 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > Thanks for putting that together, Chun. > > So, it looks like there are ~400 instantiations of the class, and the time > from the first instantiation to the last one is about ~1.5s. Is that on the > order of the slowdown your experiencing? > > (note: I'm testing with Pig 11...if your slowdown is much higher than that, > I'll test on Pig 10) > > Either way, it seems like the slowdown is directly attributable to UDF > invocations. Have you seen slowdowns much larger than this? > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > >> Hi Jonathan, >> >> Here is a more self-contained example than what I had before: >> http://ews.illinois.edu/~yang43/shared/students.tar.gz>> >> I wrote a trivial GFV class, but the slowdown still exists. >> students-a.pig starts up noticeably slower than students-b.pig . >> >> Thanks, >> Chun >> >> On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >> >>> Thanks for this info. Can you go ahead and paste the whole GFV class? >>> >>> Thanks >>> >>> 2012/8/8 Chun Yang <[EMAIL PROTECTED]> >>> >>>> Thanks Jonathan, >>>> >>>> I've tried to produce an example script which exhibits the slowdown and >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3>>>> >>>> The slowdown seems to occur when we are using a lot of UDFs to parse our >>>> input data. Variant A in the script is noticeably slower than variant B >> in >>>> Pig 0.10 while performance is similar in Pig 0.9.1 >>>> >>>> I've pasted the exec() function of the GFV function on Pastebin as well: >>>> http://pastebin.com/FVnkQCJ5>>>> >>>> Please let us know if you need more details. >>>> >>>> Thanks, >>>> Chun >>>> >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >>>> >>>>> Can you guys give a script that has the issue? My tactic would be to >> use >>>>> some sort of profiler (we have access to YourKit for open source Pig >>>>> contribution work) and try and isolate what is triggering GC. >>>>> >>>>> 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]> >>>>> >>>>>> Hi All, >>>>>> >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig users >>>> have >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same script >>>> runs >>>>>> fine with 0.9.1. Anyone else facing similar issues? >>>>>> >>>>>> Thanks, >>>>>> Prashant >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run >> the >>>>>> same >>>>>> script using the two Pig versions, 0.9.1 starts off fast and almost >>>>>> immediately submits the job to the cluster. On the other hand, Pig >>>> 0.10.0 >>>>>> takes forever to submit the job. When I use the java option >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many >>>> times >>>>>> before and after the job is submitted to the cluster. >>>>>> >>>>>> Does anyone know what is causing this and/or how I might be able to >>>>>> troubleshoot it? >>>>>> >>>>>> I've uploaded truncated output showing when GC happens to >>>>>> Pastebin: http://pastebin.com/B8WTHW9r>>>>>> >>>>>> Thanks, >>>>>> Chun >>>>>> >>>> >>>> >> >>
+
Chun Yang 2012-08-09, 00:51
-
Re: Pig 0.10.0 slow startup
Jonathan Coveney 2012-08-09, 18:00
Can you do me a favor and run the exact same stuff with pig11? Just to isolate if this is an issue that has been removed. I will also try and run this on pig10, to see if I can see te same issue. 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > Thanks Jonathan, > > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1: > > pig10 -b -e 'explain -script students-a.pig' 35.35s user 8.52s system 63% > cpu 1:08.77 total > > pig10 -b -e 'explain -script students-b.pig' 5.32s user 0.48s system 130% > cpu 4.460 total > > pig9 -b -e 'explain -script students-a.pig' 4.93s user 0.51s system 131% > cpu 4.153 total > > pig9 -b -e 'explain -script students-b.pig' 3.86s user 0.41s system 131% > cpu 3.254 total > > Seems like the first run is always slower, but subsequent runs are about > the > same: > > pig10 -b -e 'explain -script students-a.pig' 35.17s user 8.20s system 123% > cpu 35.017 total > > pig10 -b -e 'explain -script students-a.pig' 35.41s user 8.55s system 122% > cpu 35.803 total > > A little more than 1.5s slowdown :) > > Thanks, > Chun > > On 8/8/12 5:38 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > > Thanks for putting that together, Chun. > > > > So, it looks like there are ~400 instantiations of the class, and the > time > > from the first instantiation to the last one is about ~1.5s. Is that on > the > > order of the slowdown your experiencing? > > > > (note: I'm testing with Pig 11...if your slowdown is much higher than > that, > > I'll test on Pig 10) > > > > Either way, it seems like the slowdown is directly attributable to UDF > > invocations. Have you seen slowdowns much larger than this? > > > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > > > >> Hi Jonathan, > >> > >> Here is a more self-contained example than what I had before: > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz> >> > >> I wrote a trivial GFV class, but the slowdown still exists. > >> students-a.pig starts up noticeably slower than students-b.pig . > >> > >> Thanks, > >> Chun > >> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > >> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class? > >>> > >>> Thanks > >>> > >>> 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > >>> > >>>> Thanks Jonathan, > >>>> > >>>> I've tried to produce an example script which exhibits the slowdown > and > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3> >>>> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse > our > >>>> input data. Variant A in the script is noticeably slower than variant > B > >> in > >>>> Pig 0.10 while performance is similar in Pig 0.9.1 > >>>> > >>>> I've pasted the exec() function of the GFV function on Pastebin as > well: > >>>> http://pastebin.com/FVnkQCJ5> >>>> > >>>> Please let us know if you need more details. > >>>> > >>>> Thanks, > >>>> Chun > >>>> > >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > >>>> > >>>>> Can you guys give a script that has the issue? My tactic would be to > >> use > >>>>> some sort of profiler (we have access to YourKit for open source Pig > >>>>> contribution work) and try and isolate what is triggering GC. > >>>>> > >>>>> 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]> > >>>>> > >>>>>> Hi All, > >>>>>> > >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig > users > >>>> have > >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same > script > >>>> runs > >>>>>> fine with 0.9.1. Anyone else facing similar issues? > >>>>>> > >>>>>> Thanks, > >>>>>> Prashant > >>>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run > >> the > >>>>>> same > >>>>>> script using the two Pig versions, 0.9.1 starts off fast and almost > >>>>>> immediately submits the job to the cluster. On the other hand, Pig > >>>> 0.10.0 > >>>>>> takes forever to submit the job. When I use the java option
+
Jonathan Coveney 2012-08-09, 18:00
-
RE: Pig 0.10.0 slow startup
Chun Yang 2012-08-09, 22:32
I tried with pig11 (from git), timing for the two variants are more comparable. stats for `pig11 -b -e 'explain -script students-a.pig'` 6.33s user 0.74s system 153% cpu 4.611 total 6.55s user 0.68s system 155% cpu 4.664 total 6.40s user 0.79s system 157% cpu 4.560 total 6.47s user 0.62s system 155% cpu 4.560 total stats for `pig11 -b -e 'explain -script students-b.pig'` 5.66s user 0.62s system 169% cpu 3.707 total 5.69s user 0.53s system 165% cpu 3.758 total 5.44s user 0.70s system 165% cpu 3.706 total 5.68s user 0.51s system 166% cpu 3.708 total So looks like it was fixed somewhere for 0.11? ________________________________________ From: Jonathan Coveney [[EMAIL PROTECTED]] Sent: Thursday, August 09, 2012 11:00 AM To: [EMAIL PROTECTED] Subject: Re: Pig 0.10.0 slow startup Can you do me a favor and run the exact same stuff with pig11? Just to isolate if this is an issue that has been removed. I will also try and run this on pig10, to see if I can see te same issue. 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > Thanks Jonathan, > > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1: > > pig10 -b -e 'explain -script students-a.pig' 35.35s user 8.52s system 63% > cpu 1:08.77 total > > pig10 -b -e 'explain -script students-b.pig' 5.32s user 0.48s system 130% > cpu 4.460 total > > pig9 -b -e 'explain -script students-a.pig' 4.93s user 0.51s system 131% > cpu 4.153 total > > pig9 -b -e 'explain -script students-b.pig' 3.86s user 0.41s system 131% > cpu 3.254 total > > Seems like the first run is always slower, but subsequent runs are about > the > same: > > pig10 -b -e 'explain -script students-a.pig' 35.17s user 8.20s system 123% > cpu 35.017 total > > pig10 -b -e 'explain -script students-a.pig' 35.41s user 8.55s system 122% > cpu 35.803 total > > A little more than 1.5s slowdown :) > > Thanks, > Chun > > On 8/8/12 5:38 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > > Thanks for putting that together, Chun. > > > > So, it looks like there are ~400 instantiations of the class, and the > time > > from the first instantiation to the last one is about ~1.5s. Is that on > the > > order of the slowdown your experiencing? > > > > (note: I'm testing with Pig 11...if your slowdown is much higher than > that, > > I'll test on Pig 10) > > > > Either way, it seems like the slowdown is directly attributable to UDF > > invocations. Have you seen slowdowns much larger than this? > > > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > > > >> Hi Jonathan, > >> > >> Here is a more self-contained example than what I had before: > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz> >> > >> I wrote a trivial GFV class, but the slowdown still exists. > >> students-a.pig starts up noticeably slower than students-b.pig . > >> > >> Thanks, > >> Chun > >> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > >> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class? > >>> > >>> Thanks > >>> > >>> 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > >>> > >>>> Thanks Jonathan, > >>>> > >>>> I've tried to produce an example script which exhibits the slowdown > and > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3> >>>> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse > our > >>>> input data. Variant A in the script is noticeably slower than variant > B > >> in > >>>> Pig 0.10 while performance is similar in Pig 0.9.1 > >>>> > >>>> I've pasted the exec() function of the GFV function on Pastebin as > well: > >>>> http://pastebin.com/FVnkQCJ5> >>>> > >>>> Please let us know if you need more details. > >>>> > >>>> Thanks, > >>>> Chun > >>>> > >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > >>>> > >>>>> Can you guys give a script that has the issue? My tactic would be to > >> use > >>>>> some sort of profiler (we have access to YourKit for open source Pig > >>>>> contribution work) and try and isolate what is triggering GC.
+
Chun Yang 2012-08-09, 22:32
-
Re: Pig 0.10.0 slow startup
Prashant Kommireddi 2012-08-10, 20:15
Thanks Chun. Jon, any idea what on 0.11 might have fixed it? On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang <[EMAIL PROTECTED]>wrote: > I tried with pig11 (from git), timing for the two variants are more > comparable. > > stats for `pig11 -b -e 'explain -script students-a.pig'` > 6.33s user 0.74s system 153% cpu 4.611 total > 6.55s user 0.68s system 155% cpu 4.664 total > 6.40s user 0.79s system 157% cpu 4.560 total > 6.47s user 0.62s system 155% cpu 4.560 total > > stats for `pig11 -b -e 'explain -script students-b.pig'` > 5.66s user 0.62s system 169% cpu 3.707 total > 5.69s user 0.53s system 165% cpu 3.758 total > 5.44s user 0.70s system 165% cpu 3.706 total > 5.68s user 0.51s system 166% cpu 3.708 total > > So looks like it was fixed somewhere for 0.11? > ________________________________________ > From: Jonathan Coveney [[EMAIL PROTECTED]] > Sent: Thursday, August 09, 2012 11:00 AM > To: [EMAIL PROTECTED] > Subject: Re: Pig 0.10.0 slow startup > > Can you do me a favor and run the exact same stuff with pig11? Just to > isolate if this is an issue that has been removed. I will also try and run > this on pig10, to see if I can see te same issue. > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > > > Thanks Jonathan, > > > > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1: > > > > pig10 -b -e 'explain -script students-a.pig' 35.35s user 8.52s system > 63% > > cpu 1:08.77 total > > > > pig10 -b -e 'explain -script students-b.pig' 5.32s user 0.48s system > 130% > > cpu 4.460 total > > > > pig9 -b -e 'explain -script students-a.pig' 4.93s user 0.51s system 131% > > cpu 4.153 total > > > > pig9 -b -e 'explain -script students-b.pig' 3.86s user 0.41s system 131% > > cpu 3.254 total > > > > Seems like the first run is always slower, but subsequent runs are about > > the > > same: > > > > pig10 -b -e 'explain -script students-a.pig' 35.17s user 8.20s system > 123% > > cpu 35.017 total > > > > pig10 -b -e 'explain -script students-a.pig' 35.41s user 8.55s system > 122% > > cpu 35.803 total > > > > A little more than 1.5s slowdown :) > > > > Thanks, > > Chun > > > > On 8/8/12 5:38 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > > > > Thanks for putting that together, Chun. > > > > > > So, it looks like there are ~400 instantiations of the class, and the > > time > > > from the first instantiation to the last one is about ~1.5s. Is that on > > the > > > order of the slowdown your experiencing? > > > > > > (note: I'm testing with Pig 11...if your slowdown is much higher than > > that, > > > I'll test on Pig 10) > > > > > > Either way, it seems like the slowdown is directly attributable to UDF > > > invocations. Have you seen slowdowns much larger than this? > > > > > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > > > > > >> Hi Jonathan, > > >> > > >> Here is a more self-contained example than what I had before: > > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz> > >> > > >> I wrote a trivial GFV class, but the slowdown still exists. > > >> students-a.pig starts up noticeably slower than students-b.pig . > > >> > > >> Thanks, > > >> Chun > > >> > > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > >> > > >>> Thanks for this info. Can you go ahead and paste the whole GFV class? > > >>> > > >>> Thanks > > >>> > > >>> 2012/8/8 Chun Yang <[EMAIL PROTECTED]> > > >>> > > >>>> Thanks Jonathan, > > >>>> > > >>>> I've tried to produce an example script which exhibits the slowdown > > and > > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3> > >>>> > > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse > > our > > >>>> input data. Variant A in the script is noticeably slower than > variant > > B > > >> in > > >>>> Pig 0.10 while performance is similar in Pig 0.9.1 > > >>>> > > >>>> I've pasted the exec() function of the GFV function on Pastebin as > > well: > > >>>> http://pastebin.com/FVnkQCJ5> > >>>>
+
Prashant Kommireddi 2012-08-10, 20:15
-
Re: Pig 0.10.0 slow startup
Dmitriy Ryaboy 2012-08-13, 23:44
Julien removed a dozen or so loader/storer instantiations. That can do it if you do work in constructors. D On Fri, Aug 10, 2012 at 1:15 PM, Prashant Kommireddi <[EMAIL PROTECTED]> wrote: > Thanks Chun. > > Jon, any idea what on 0.11 might have fixed it? > > On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang > <[EMAIL PROTECTED]>wrote: > >> I tried with pig11 (from git), timing for the two variants are more >> comparable. >> >> stats for `pig11 -b -e 'explain -script students-a.pig'` >> 6.33s user 0.74s system 153% cpu 4.611 total >> 6.55s user 0.68s system 155% cpu 4.664 total >> 6.40s user 0.79s system 157% cpu 4.560 total >> 6.47s user 0.62s system 155% cpu 4.560 total >> >> stats for `pig11 -b -e 'explain -script students-b.pig'` >> 5.66s user 0.62s system 169% cpu 3.707 total >> 5.69s user 0.53s system 165% cpu 3.758 total >> 5.44s user 0.70s system 165% cpu 3.706 total >> 5.68s user 0.51s system 166% cpu 3.708 total >> >> So looks like it was fixed somewhere for 0.11? >> ________________________________________ >> From: Jonathan Coveney [[EMAIL PROTECTED]] >> Sent: Thursday, August 09, 2012 11:00 AM >> To: [EMAIL PROTECTED] >> Subject: Re: Pig 0.10.0 slow startup >> >> Can you do me a favor and run the exact same stuff with pig11? Just to >> isolate if this is an issue that has been removed. I will also try and run >> this on pig10, to see if I can see te same issue. >> >> 2012/8/8 Chun Yang <[EMAIL PROTECTED]> >> >> > Thanks Jonathan, >> > >> > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1: >> > >> > pig10 -b -e 'explain -script students-a.pig' 35.35s user 8.52s system >> 63% >> > cpu 1:08.77 total >> > >> > pig10 -b -e 'explain -script students-b.pig' 5.32s user 0.48s system >> 130% >> > cpu 4.460 total >> > >> > pig9 -b -e 'explain -script students-a.pig' 4.93s user 0.51s system 131% >> > cpu 4.153 total >> > >> > pig9 -b -e 'explain -script students-b.pig' 3.86s user 0.41s system 131% >> > cpu 3.254 total >> > >> > Seems like the first run is always slower, but subsequent runs are about >> > the >> > same: >> > >> > pig10 -b -e 'explain -script students-a.pig' 35.17s user 8.20s system >> 123% >> > cpu 35.017 total >> > >> > pig10 -b -e 'explain -script students-a.pig' 35.41s user 8.55s system >> 122% >> > cpu 35.803 total >> > >> > A little more than 1.5s slowdown :) >> > >> > Thanks, >> > Chun >> > >> > On 8/8/12 5:38 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >> > >> > > Thanks for putting that together, Chun. >> > > >> > > So, it looks like there are ~400 instantiations of the class, and the >> > time >> > > from the first instantiation to the last one is about ~1.5s. Is that on >> > the >> > > order of the slowdown your experiencing? >> > > >> > > (note: I'm testing with Pig 11...if your slowdown is much higher than >> > that, >> > > I'll test on Pig 10) >> > > >> > > Either way, it seems like the slowdown is directly attributable to UDF >> > > invocations. Have you seen slowdowns much larger than this? >> > > >> > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]> >> > > >> > >> Hi Jonathan, >> > >> >> > >> Here is a more self-contained example than what I had before: >> > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz>> > >> >> > >> I wrote a trivial GFV class, but the slowdown still exists. >> > >> students-a.pig starts up noticeably slower than students-b.pig . >> > >> >> > >> Thanks, >> > >> Chun >> > >> >> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >> > >> >> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class? >> > >>> >> > >>> Thanks >> > >>> >> > >>> 2012/8/8 Chun Yang <[EMAIL PROTECTED]> >> > >>> >> > >>>> Thanks Jonathan, >> > >>>> >> > >>>> I've tried to produce an example script which exhibits the slowdown >> > and >> > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3>> > >>>> >> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse
+
Dmitriy Ryaboy 2012-08-13, 23:44
-
Pig 0.10.0 slow startup
Chun Yang 2012-07-26, 22:32
Hi all, I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the same script using the two Pig versions, 0.9.1 starts off fast and almost immediately submits the job to the cluster. On the other hand, Pig 0.10.0 takes forever to submit the job. When I use the java option -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times before and after the job is submitted to the cluster. Does anyone know what is causing this and/or how I might be able to troubleshoot it? I've uploaded truncated output showing when GC happens to Pastebin: http://pastebin.com/B8WTHW9rThanks, Chun
+
Chun Yang 2012-07-26, 22:32
|
|