

Pigify Data Input to UDF for Unit TestingDan DeCapria, CivicScienc... 20130311, 19:35
First poster here! Really excited to get some feedback and contribute to
Pig! I am attempting to simplify the UDF input process in the context of scaling JUnit testing. Previously, to create a valid Pig input for my UDFs for JUnit testing, I have had to make each layer/nesting of the Pig input from org.apache.pig.data.* constructs, per each use case to unit test. I am looking for a quick methodology to simplify this process and to scale for addition unit testing. A use case is defined below: Assume the input schema is defined a priori. Assume also that the outputSchema is properly defined in the UDF to be unit tested. Illustrating the InputSchema from the prior pig process, I have the InputData in the form of InputSchema, per my testing UDF. Conceptually, the unit testing approach is as follows: InputSchema bag_a:bag{tuple_b:tuple(tuple_c1:tuple(tuple_d1:tuple(field_a:chararray,field_b:chararray)),field_e:chararray)} OutputSchema bag_a:bag{tuple_b:tuple(tuple_c1:tuple(tuple_d1:tuple(field_a:chararray,field_b:chararray),tuple_d2:tuple(field_c:chararray,field_d:chararray)),field_e:chararray)} Prior (nonscalable) methodology: Create bag_a DataBag. Create tuple_b Tuple. Create tuple_c1 Tuple. Create tuple_d1 Tuple. append data field_a to tuple_d1. append data field_b to tuple_d1. append tuple_c1 to tuple_b. append data field_e to tuple_b. append tuple_b to bag_a. unit test UDF(bag_a). // Is there a way to 'pigify' the InputSchema data String, as it appears from illustrate of the prior pig process, to be fed into the UDF(InputData), such that I do not have to perform the Prior methodology explicitly? A solution would be ideal of the form: Awesome methodology: String_of_data_in_inputFormat: bag_a:bag{tuple_b:tuple(tuple_c1:tuple(tuple_d1:tuple(field_a:chararray,field_b:chararray)),field_b)} DataBag bag_a = pigify(String_of_data_in_inputFormat); unit test UDF(bag_a). // Thanks in advance, Dan DeCapria 