Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> debug feature??


+
Yang 2012-10-19, 09:09
+
Jagat Singh 2012-10-19, 09:18
+
Yang 2012-10-19, 19:01
Copy link to this message
-
Re: debug feature??
Some testing tips:

1) parametrize your load/store statements so that if you have to run
in hadoop mode, it's easy to switch to debug inputs / outputs (and
debug input/output loaders and storers). It's vastly preferable to
test in local mode when possible, since the iterations are so much
faster.

2) it's a good thing that PigUnit makes you test small pieces of code!
Factor out macros so that you can create unit tests; don't copy and
paste code, use macros and the import statement.

3) Try using mock.Storage (see
https://issues.apache.org/jira/browse/PIG-2650) to automatically
create inputs and examine outputs in your unit tests, if you are on
pig 11.

D

On Fri, Oct 19, 2012 at 12:01 PM, Yang <[EMAIL PROTECTED]> wrote:
> I am using PigUnit, but it's somewhat limited: it can run only localmode,
> so I can't find issues that come with fairly large test data; you have to
> create small snippets of code that you cut out manually from your original
> code, so after you tested a snippet to be fine, you have to copy-paste that
> back into the production code, which introduces possible copy-paste errors.
>  if you compare this to java junit, this is really very crude: in java, you
> have a class, and you can do junit testing on individual methods of the
> class, instead of having to copy paste and create a special "test version"
> of that class.
>
>
> overall, I feel that testability is an area where PIG could spend a lot
> more efforts and it will greatly benefit its wider adoption.  ----- some
> other tools (Cascading, Cascalog etc) advertise testability as one of their
> important features.
>
> let me check out penny... thanks
>
> On Fri, Oct 19, 2012 at 2:18 AM, Jagat Singh <[EMAIL PROTECTED]> wrote:
>
>> Hello ,
>>
>> I understand the pain :)
>>
>> Have you seen PigUnit and Penny
>>
>> http://pig.apache.org/docs/r0.10.0/test.html
>>
>>
>>
>> On Fri, Oct 19, 2012 at 8:09 PM, Yang <[EMAIL PROTECTED]> wrote:
>>
>> > one of the greatest pains I face with debugging a pig code is that the
>> > iteration cycles are really long:
>> > the applications for which we use pig typically deal with large dataset,
>> > and if a pig script involves many
>> > JOIN/generate/filter steps, every step takes a lot of time, but every
>> time
>> > I fix one step, I have to run from the start,
>> > which is meaningless.
>> >
>> > what I am doing so far to reduce the meaningless wasted time to re-run
>> > already-debugged steps, is to
>> > manually divide my script into many small scripts, and save the last
>> > variable out into hdfs, and once the
>> > small script is debugged fine, I load the previous variable in the next
>> > small script
>> >
>> > after all small scripts are done, I connect them back manually to the
>> > original big script.
>> >
>> >
>> > is there a way to automate this? for example add a mark around a
>> particular
>> > step, and tells pig
>> > that the result is to be saved up, and all following steps are not to be
>> > executed. and when we move
>> > onto the next step, it knows where to pick up the last-saved data.
>> >
>> > writing a preprocessor to do the above is not trivial so that I can't
>> whip
>> > up something immediately , cuz it needs to figure out the
>> > schemas of variables that propagate through the steps.
>> >
>> >
>> > Thanks
>> > Yang
>> >
>>
+
Yang 2012-10-23, 18:11
+
Yang 2012-11-07, 21:05
+
Ruslan Al-Fakikh 2012-10-22, 12:55
+
Ruslan Al-Fakikh 2012-10-19, 13:04
+
Yang 2012-10-19, 18:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB