Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig 0.10.0 slow startup


Copy link to this message
-
Re: Pig 0.10.0 slow startup
Thanks Chun.

Jon, any idea what on 0.11 might have fixed it?

On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang
<[EMAIL PROTECTED]>wrote:

> I tried with pig11 (from git), timing for the two variants are more
> comparable.
>
> stats for `pig11 -b -e 'explain -script students-a.pig'`
> 6.33s user 0.74s system 153% cpu 4.611 total
> 6.55s user 0.68s system 155% cpu 4.664 total
> 6.40s user 0.79s system 157% cpu 4.560 total
> 6.47s user 0.62s system 155% cpu 4.560 total
>
> stats for `pig11 -b -e 'explain -script students-b.pig'`
> 5.66s user 0.62s system 169% cpu 3.707 total
> 5.69s user 0.53s system 165% cpu 3.758 total
> 5.44s user 0.70s system 165% cpu 3.706 total
> 5.68s user 0.51s system 166% cpu 3.708 total
>
> So looks like it was fixed somewhere for 0.11?
> ________________________________________
> From: Jonathan Coveney [[EMAIL PROTECTED]]
> Sent: Thursday, August 09, 2012 11:00 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Pig 0.10.0 slow startup
>
> Can you do me a favor and run the exact same stuff with pig11? Just to
> isolate if this is an issue that has been removed. I will also try and run
> this on pig10, to see if I can see te same issue.
>
> 2012/8/8 Chun Yang <[EMAIL PROTECTED]>
>
> > Thanks Jonathan,
> >
> > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1:
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.35s user 8.52s system
> 63%
> > cpu 1:08.77 total
> >
> > pig10 -b -e 'explain -script students-b.pig'  5.32s user 0.48s system
> 130%
> > cpu 4.460 total
> >
> > pig9 -b -e 'explain -script students-a.pig'  4.93s user 0.51s system 131%
> > cpu 4.153 total
> >
> > pig9 -b -e 'explain -script students-b.pig'  3.86s user 0.41s system 131%
> > cpu 3.254 total
> >
> > Seems like the first run is always slower, but subsequent runs are about
> > the
> > same:
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.17s user 8.20s system
> 123%
> > cpu 35.017 total
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.41s user 8.55s system
> 122%
> > cpu 35.803 total
> >
> > A little more than 1.5s slowdown :)
> >
> > Thanks,
> > Chun
> >
> > On 8/8/12 5:38 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks for putting that together, Chun.
> > >
> > > So, it looks like there are ~400 instantiations of the class, and the
> > time
> > > from the first instantiation to the last one is about ~1.5s. Is that on
> > the
> > > order of the slowdown your experiencing?
> > >
> > > (note: I'm testing with Pig 11...if your slowdown is much higher than
> > that,
> > > I'll test on Pig 10)
> > >
> > > Either way, it seems like the slowdown is directly attributable to UDF
> > > invocations. Have you seen slowdowns much larger than this?
> > >
> > > 2012/8/8 Chun Yang <[EMAIL PROTECTED]>
> > >
> > >> Hi Jonathan,
> > >>
> > >> Here is a more self-contained example than what I had before:
> > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz
> > >>
> > >> I wrote a trivial GFV class, but the slowdown still exists.
> > >> students-a.pig starts up noticeably slower than students-b.pig .
> > >>
> > >> Thanks,
> > >> Chun
> > >>
> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
> > >>
> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class?
> > >>>
> > >>> Thanks
> > >>>
> > >>> 2012/8/8 Chun Yang <[EMAIL PROTECTED]>
> > >>>
> > >>>> Thanks Jonathan,
> > >>>>
> > >>>> I've tried to produce an example script which exhibits the slowdown
> > and
> > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
> > >>>>
> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse
> > our
> > >>>> input data. Variant A in the script is noticeably slower than
> variant
> > B
> > >> in
> > >>>> Pig 0.10 while performance is similar in Pig 0.9.1
> > >>>>
> > >>>> I've pasted the exec() function of the GFV function on Pastebin as
> > well:
> > >>>> http://pastebin.com/FVnkQCJ5
> > >>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB