Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> debug feature??


Copy link to this message
-
Re: debug feature??
Hi,

Basically it would be perfect if you first test with a small amount of
data in local mode and then run the script on the big data to verify
the correctness.
If this is not possible you can store a relation at any point of your
script with a STORE statement, so not to lose intermediate results.
And then you can remove the STORE's after debugging.

Best Regards, Ruslan

On Fri, Oct 19, 2012 at 1:18 PM, Jagat Singh <[EMAIL PROTECTED]> wrote:
> Hello ,
>
> I understand the pain :)
>
> Have you seen PigUnit and Penny
>
> http://pig.apache.org/docs/r0.10.0/test.html
>
>
>
> On Fri, Oct 19, 2012 at 8:09 PM, Yang <[EMAIL PROTECTED]> wrote:
>
>> one of the greatest pains I face with debugging a pig code is that the
>> iteration cycles are really long:
>> the applications for which we use pig typically deal with large dataset,
>> and if a pig script involves many
>> JOIN/generate/filter steps, every step takes a lot of time, but every time
>> I fix one step, I have to run from the start,
>> which is meaningless.
>>
>> what I am doing so far to reduce the meaningless wasted time to re-run
>> already-debugged steps, is to
>> manually divide my script into many small scripts, and save the last
>> variable out into hdfs, and once the
>> small script is debugged fine, I load the previous variable in the next
>> small script
>>
>> after all small scripts are done, I connect them back manually to the
>> original big script.
>>
>>
>> is there a way to automate this? for example add a mark around a particular
>> step, and tells pig
>> that the result is to be saved up, and all following steps are not to be
>> executed. and when we move
>> onto the next step, it knows where to pick up the last-saved data.
>>
>> writing a preprocessor to do the above is not trivial so that I can't whip
>> up something immediately , cuz it needs to figure out the
>> schemas of variables that propagate through the steps.
>>
>>
>> Thanks
>> Yang
>>