Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> debug feature??


+
Yang 2012-10-19, 09:09
+
Jagat Singh 2012-10-19, 09:18
+
Yang 2012-10-19, 19:01
+
Dmitriy Ryaboy 2012-10-23, 00:32
+
Yang 2012-10-23, 18:11
Copy link to this message
-
Re: debug feature??
ok, I found this practice to be useful:
I divide my code into sections, each section implemented as a macro.

then I debug each macro separately, at the end of each macro, I manually
write
its output vars into tmp storage. Then for each macro, I write a
corresponding "***_fake.pig" macro, which has the same signature, but
populates the same return vars by loading them from the tmp storage.

then after I am done with one section, I swap out the IMPORT sentence to
import the **_fake.pig script instead, so that the same computation is not
done again.
On Tue, Oct 23, 2012 at 11:11 AM, Yang <[EMAIL PROTECTED]> wrote:

> nice, thanks
>
> macros and mock.Storage() are both new to me, I believe it will help a lot
>
>
> On Mon, Oct 22, 2012 at 5:32 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>wrote:
>
>> Some testing tips:
>>
>> 1) parametrize your load/store statements so that if you have to run
>> in hadoop mode, it's easy to switch to debug inputs / outputs (and
>> debug input/output loaders and storers). It's vastly preferable to
>> test in local mode when possible, since the iterations are so much
>> faster.
>>
>> 2) it's a good thing that PigUnit makes you test small pieces of code!
>> Factor out macros so that you can create unit tests; don't copy and
>> paste code, use macros and the import statement.
>>
>> 3) Try using mock.Storage (see
>> https://issues.apache.org/jira/browse/PIG-2650) to automatically
>> create inputs and examine outputs in your unit tests, if you are on
>> pig 11.
>>
>> D
>>
>> On Fri, Oct 19, 2012 at 12:01 PM, Yang <[EMAIL PROTECTED]> wrote:
>> > I am using PigUnit, but it's somewhat limited: it can run only
>> localmode,
>> > so I can't find issues that come with fairly large test data; you have
>> to
>> > create small snippets of code that you cut out manually from your
>> original
>> > code, so after you tested a snippet to be fine, you have to copy-paste
>> that
>> > back into the production code, which introduces possible copy-paste
>> errors.
>> >  if you compare this to java junit, this is really very crude: in java,
>> you
>> > have a class, and you can do junit testing on individual methods of the
>> > class, instead of having to copy paste and create a special "test
>> version"
>> > of that class.
>> >
>> >
>> > overall, I feel that testability is an area where PIG could spend a lot
>> > more efforts and it will greatly benefit its wider adoption.  ----- some
>> > other tools (Cascading, Cascalog etc) advertise testability as one of
>> their
>> > important features.
>> >
>> > let me check out penny... thanks
>> >
>> > On Fri, Oct 19, 2012 at 2:18 AM, Jagat Singh <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> Hello ,
>> >>
>> >> I understand the pain :)
>> >>
>> >> Have you seen PigUnit and Penny
>> >>
>> >> http://pig.apache.org/docs/r0.10.0/test.html
>> >>
>> >>
>> >>
>> >> On Fri, Oct 19, 2012 at 8:09 PM, Yang <[EMAIL PROTECTED]> wrote:
>> >>
>> >> > one of the greatest pains I face with debugging a pig code is that
>> the
>> >> > iteration cycles are really long:
>> >> > the applications for which we use pig typically deal with large
>> dataset,
>> >> > and if a pig script involves many
>> >> > JOIN/generate/filter steps, every step takes a lot of time, but every
>> >> time
>> >> > I fix one step, I have to run from the start,
>> >> > which is meaningless.
>> >> >
>> >> > what I am doing so far to reduce the meaningless wasted time to
>> re-run
>> >> > already-debugged steps, is to
>> >> > manually divide my script into many small scripts, and save the last
>> >> > variable out into hdfs, and once the
>> >> > small script is debugged fine, I load the previous variable in the
>> next
>> >> > small script
>> >> >
>> >> > after all small scripts are done, I connect them back manually to the
>> >> > original big script.
>> >> >
>> >> >
>> >> > is there a way to automate this? for example add a mark around a
>> >> particular
>> >> > step, and tells pig
>> >> > that the result is to be saved up, and all following steps are not
+
Ruslan Al-Fakikh 2012-10-22, 12:55
+
Ruslan Al-Fakikh 2012-10-19, 13:04
+
Yang 2012-10-19, 18:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB