Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Is pig maddening to work with because it's so slow?


Copy link to this message
-
Re: Is pig maddening to work with because it's so slow?
Seconded for PigUnit.

As for a faster debugging procedure, I've gone modular. First I JUnit test
individual UDFs against their functional requirements and use cases a
priori.  Then I mockup my whiteboard workflow as multiple pig script
logical blocks (multiple pig files to test), start a pig -x local, and try
each aliased line one-by-one per each logical block, with a DESCRIBE after
each.  This ensures that I have correct syntactical formulation in the
scripting, schemas, desired re-aliasing, etc., and you can merge logical
blocks back together for optimizations when blocks are completed.

Once a block is completed, you can do an ILLUSTRATE on each block to
spot-check results as well, but be forewarned, I've had issues with larger
scripts failing prematurely in this regard due to complexity.

Hope this helps,

-Dan
On Tue, May 20, 2014 at 3:26 PM, Suraj Nayak <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB