Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - debug feature??


Copy link to this message
-
Re: debug feature??
Jagat Singh 2012-10-19, 09:18
Hello ,

I understand the pain :)

Have you seen PigUnit and Penny

http://pig.apache.org/docs/r0.10.0/test.html

On Fri, Oct 19, 2012 at 8:09 PM, Yang <[EMAIL PROTECTED]> wrote:

> one of the greatest pains I face with debugging a pig code is that the
> iteration cycles are really long:
> the applications for which we use pig typically deal with large dataset,
> and if a pig script involves many
> JOIN/generate/filter steps, every step takes a lot of time, but every time
> I fix one step, I have to run from the start,
> which is meaningless.
>
> what I am doing so far to reduce the meaningless wasted time to re-run
> already-debugged steps, is to
> manually divide my script into many small scripts, and save the last
> variable out into hdfs, and once the
> small script is debugged fine, I load the previous variable in the next
> small script
>
> after all small scripts are done, I connect them back manually to the
> original big script.
>
>
> is there a way to automate this? for example add a mark around a particular
> step, and tells pig
> that the result is to be saved up, and all following steps are not to be
> executed. and when we move
> onto the next step, it knows where to pick up the last-saved data.
>
> writing a preprocessor to do the above is not trivial so that I can't whip
> up something immediately , cuz it needs to figure out the
> schemas of variables that propagate through the steps.
>
>
> Thanks
> Yang
>