Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> debug feature??


Copy link to this message
-
Re: debug feature??
Hello ,

I understand the pain :)

Have you seen PigUnit and Penny

http://pig.apache.org/docs/r0.10.0/test.html

On Fri, Oct 19, 2012 at 8:09 PM, Yang <[EMAIL PROTECTED]> wrote:

> one of the greatest pains I face with debugging a pig code is that the
> iteration cycles are really long:
> the applications for which we use pig typically deal with large dataset,
> and if a pig script involves many
> JOIN/generate/filter steps, every step takes a lot of time, but every time
> I fix one step, I have to run from the start,
> which is meaningless.
>
> what I am doing so far to reduce the meaningless wasted time to re-run
> already-debugged steps, is to
> manually divide my script into many small scripts, and save the last
> variable out into hdfs, and once the
> small script is debugged fine, I load the previous variable in the next
> small script
>
> after all small scripts are done, I connect them back manually to the
> original big script.
>
>
> is there a way to automate this? for example add a mark around a particular
> step, and tells pig
> that the result is to be saved up, and all following steps are not to be
> executed. and when we move
> onto the next step, it knows where to pick up the last-saved data.
>
> writing a preprocessor to do the above is not trivial so that I can't whip
> up something immediately , cuz it needs to figure out the
> schemas of variables that propagate through the steps.
>
>
> Thanks
> Yang
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB