We've been using pig unit for a little while now and wanted to see if the pig community would be okay with us posting a patch or two to add the following:
1) add support to test multiple inputs and multiple outputs
One of our devs said - It has a really nice method assertOutput(String inputAlias, String inputValues, String outputAlias, String expectedOutputValues). That method lets you override an input alias variable with a hardcoded list of values. That way, the script doesn't actually have to read that input variable from hdfs or cassandra. Then, it runs the script and checks the specified output alias variable against the expected set of values. It's a really nice way to test your entire pig script with a single method call, but only IF your script has exactly 1 input and 1 output. If you want to test more complicated scripts, you have to jump through some hoops in order to override more input variables. But, it would be fairly easy to change PigUnit so that it can override any number of inputs and check any number of outputs and do so easily. That's basically the change that I put into the base testing class I wrote. But, it would be better to push that into PigUnit itself, and it's something that could easily be done in an afternoon.
Does this sound reasonable and something we could hack on at our Austin hack day tomorrow?
2) Some javadocs for the pig unit test classes to make them more readable.
Would we just create a couple of tickets for this? Just trying to make sure that's the route to take as we're trying to get bootstrapped on helping out where we can.