Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> long parse time

Copy link to this message
long parse time
We have some very long pig scripts that run several times per day. We
believe that the script parsing process takes very long (about 1h). During
this time, the pig command just hangs before any output is displayed (I am
assuming this is the parsing phase). My question is, can this process be
optimized by somehow serializing the intermediate parsed script to disk
after the parsing phase is complete so that we don't have to go through the
parsing process each time the script is run (so long as the script itself
does not change)? That way, we could then load and run the parsed
representation of the script rather than re-parsing it for each run. Since
this is probably not a readily-available feature, could someone please
point me to the right place in the code where this intermediate output can
be intercepted?