Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> PIG + Junit


Copy link to this message
-
Re: PIG + Junit
Hey Todd we run against entire pig scripts with some helper classes we built basically they preprocess the variables then call register script but the test looks like this:

    @Before
    public void setUp() throws Exception {
        Helper.delete(OUT_FILE);
        runner = new PigRunner();
    }
    @Test
    public void testRecordCount() throws Exception {
     runner.execute("myscript.pig", "param1=foo","param2=bar");

     Iterator<Tuple> tuples = runner.getPigServer().openIterator("foo");
     assertEquals(41L, Helper.countTuples(tuples));
    }

It's been very useful for us to test this way.  Would love to see more chatter about other techniques.

On Jul 20, 2010, at 3:26 PM, ToddG wrote:
> I'd like to include running various PIG scripts in my continuous build system. Of course, I'll only use small datasets for this, and in the beginning, I'll only target a local machine instance. However, this brings up several questions:
>
>
> Q: Whats the best way to run PIG from java? Here's what I'm doing, following a pattern I found in some of the pig tests:
>
> 1. Create Pig resources in a base class (shamelessly copied from PigExecTestCase):
>
>    protected MiniCluster cluster;
>    protected PigServer pigServer;
>
>    @Before
>    public void setUp() throws Exception {
>
>        String execTypeString = System.getProperty("test.exectype");
>        if(execTypeString!=null && execTypeString.length()>0){
>            execType = PigServer.parseExecType(execTypeString);
>        }
>        if(execType == MAPREDUCE) {
>            cluster = MiniCluster.buildCluster();
>            pigServer = new PigServer(MAPREDUCE, cluster.getProperties());
>        } else {
>            pigServer = new PigServer(LOCAL);
>        }
>    }
>
> 2. Test classes sub class this to get access to the MiniCluster and PigServer (copied from TestPigSplit):
>
>    @Test
>    public void notestLongEvalSpec() throws Exception{
>        inputFileName = "notestLongEvalSpec-input.txt";
>        createInput(new String[] {"0\ta"});
>
>        pigServer.registerQuery("a = load '" + inputFileName + "';");
>        for (int i=0; i< 500; i++){
>            pigServer.registerQuery("a = filter a by $0 == '1';");
>        }
>        Iterator<Tuple> iter = pigServer.openIterator("a");
>        while (iter.hasNext()){
>            throw new Exception();
>        }
>    }
>
> 3. ERROR
>
> This pattern works for simple PIG directives, but I want to load up entire pig scripts, which have REGISTER and DEFINE directives, then the pigServer.registerQuery() fails with:
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Unrecognized alias REGISTER
>    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
>    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
>    at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
>    at org.apache.pig.PigServer.registerQuery(PigServer.java:441)
>    at com.audiencescience.apollo.reporting.NetworkRevenueReportTest.shouldParseNetworkRevenueReportScript(NetworkRevenueReportTest.java:74)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> Any suggestions?
>
> -Todd

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB