Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig 0.10.0 slow startup


Copy link to this message
-
Re: Pig 0.10.0 slow startup
Thanks for this info. Can you go ahead and paste the whole GFV class?

Thanks

2012/8/8 Chun Yang <[EMAIL PROTECTED]>

> Thanks Jonathan,
>
> I've tried to produce an example script which exhibits the slowdown and
> posted it on Pastebin: http://pastebin.com/kTSsDUr3
>
> The slowdown seems to occur when we are using a lot of UDFs to parse our
> input data. Variant A in the script is noticeably slower than variant B in
> Pig 0.10 while performance is similar in Pig 0.9.1
>
> I've pasted the exec() function of the GFV function on Pastebin as well:
> http://pastebin.com/FVnkQCJ5
>
> Please let us know if you need more details.
>
> Thanks,
> Chun
>
> On 8/7/12 10:07 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
>
> > Can you guys give a script that has the issue? My tactic would be to use
> > some sort of profiler (we have access to YourKit for open source Pig
> > contribution work) and try and isolate what is triggering GC.
> >
> > 2012/8/7 Prashant Kommireddi <[EMAIL PROTECTED]>
> >
> >> Hi All,
> >>
> >> Just wanted to follow-up on Chun's question. Several of our Pig users
> have
> >> been experiencing slow start-ups with Pig 0.10.0, when the same script
> runs
> >> fine with 0.9.1. Anyone else facing similar issues?
> >>
> >> Thanks,
> >> Prashant
> >>
> >> Hi all,
> >>
> >> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the
> >> same
> >> script using the two Pig versions, 0.9.1 starts off fast and almost
> >> immediately submits the job to the cluster. On the other hand, Pig
> 0.10.0
> >> takes forever to submit the job. When I use the java option
> >> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
> times
> >> before and after the job is submitted to the cluster.
> >>
> >> Does anyone know what is causing this and/or how I might be able to
> >> troubleshoot it?
> >>
> >> I've uploaded truncated output showing when GC happens to
> >> Pastebin:http://pastebin.com/B8WTHW9r
> >>
> >> Thanks,
> >> Chun
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB