Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig 0.10.0 slow startup


Copy link to this message
-
Re: Pig 0.10.0 slow startup
Prashant Kommireddi 2012-08-10, 20:15
Thanks Chun.

Jon, any idea what on 0.11 might have fixed it?

On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang
<[EMAIL PROTECTED]>wrote:

> I tried with pig11 (from git), timing for the two variants are more
> comparable.
>
> stats for `pig11 -b -e 'explain -script students-a.pig'`
> 6.33s user 0.74s system 153% cpu 4.611 total
> 6.55s user 0.68s system 155% cpu 4.664 total
> 6.40s user 0.79s system 157% cpu 4.560 total
> 6.47s user 0.62s system 155% cpu 4.560 total
>
> stats for `pig11 -b -e 'explain -script students-b.pig'`
> 5.66s user 0.62s system 169% cpu 3.707 total
> 5.69s user 0.53s system 165% cpu 3.758 total
> 5.44s user 0.70s system 165% cpu 3.706 total
> 5.68s user 0.51s system 166% cpu 3.708 total
>
> So looks like it was fixed somewhere for 0.11?
> ________________________________________
> From: Jonathan Coveney [[EMAIL PROTECTED]]
> Sent: Thursday, August 09, 2012 11:00 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Pig 0.10.0 slow startup
>
> Can you do me a favor and run the exact same stuff with pig11? Just to
> isolate if this is an issue that has been removed. I will also try and run
> this on pig10, to see if I can see te same issue.
>
> 2012/8/8 Chun Yang <[EMAIL PROTECTED]>
>
> > Thanks Jonathan,
> >
> > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1:
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.35s user 8.52s system
> 63%
> > cpu 1:08.77 total
> >
> > pig10 -b -e 'explain -script students-b.pig'  5.32s user 0.48s system
> 130%
> > cpu 4.460 total
> >
> > pig9 -b -e 'explain -script students-a.pig'  4.93s user 0.51s system 131%
> > cpu 4.153 total
> >
> > pig9 -b -e 'explain -script students-b.pig'  3.86s user 0.41s system 131%
> > cpu 3.254 total
> >
> > Seems like the first run is always slower, but subsequent runs are about
> > the
> > same:
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.17s user 8.20s system
> 123%
> > cpu 35.017 total
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.41s user 8.55s system
> 122%
> > cpu 35.803 total
> >
> > A little more than 1.5s slowdown :)
> >
> > Thanks,
> > Chun
> >
> > On 8/8/12 5:38 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks for putting that together, Chun.
> > >
> > > So, it looks like there are ~400 instantiations of the class, and the
> > time
> > > from the first instantiation to the last one is about ~1.5s. Is that on
> > the
> > > order of the slowdown your experiencing?
> > >
> > > (note: I'm testing with Pig 11...if your slowdown is much higher than
> > that,
> > > I'll test on Pig 10)
> > >
> > > Either way, it seems like the slowdown is directly attributable to UDF
> > > invocations. Have you seen slowdowns much larger than this?
> > >
> > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]>
> > >
> > >> Hi Jonathan,
> > >>
> > >> Here is a more self-contained example than what I had before:
> > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz
> > >>
> > >> I wrote a trivial GFV class, but the slowdown still exists.
> > >> students-a.pig starts up noticeably slower than students-b.pig .
> > >>
> > >> Thanks,
> > >> Chun
> > >>
> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
> > >>
> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class?
> > >>>
> > >>> Thanks
> > >>>
> > >>> 2012/8/8 Chun Yang <[EMAIL PROTECTED]>
> > >>>
> > >>>> Thanks Jonathan,
> > >>>>
> > >>>> I've tried to produce an example script which exhibits the slowdown
> > and
> > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
> > >>>>
> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse
> > our
> > >>>> input data. Variant A in the script is noticeably slower than
> variant
> > B
> > >> in
> > >>>> Pig 0.10 while performance is similar in Pig 0.9.1
> > >>>>
> > >>>> I've pasted the exec() function of the GFV function on Pastebin as
> > well:
> > >>>> http://pastebin.com/FVnkQCJ5
> > >>>>