Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Unit test UDF


Copy link to this message
-
Re: Unit test UDF
Mohit Anchlia 2012-04-24, 22:46
I was finally able to write unit test, something like this:

It seem to work so I think the way I understood these records is probably
correct.

public class OUTPUTTest {
 private static final Logger log = Logger.getLogger(OUTPUTTest.class);
 TupleFactory mTupleFactory = TupleFactory.getInstance();
 BagFactory mBagFactory = BagFactory.getInstance();
 @Test
 public void evalFuncTest() throws IOException {
  String record = "a b
{(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
Xxxxxx)}        {(OCCUP:xxxxxxx xxxxx),(AGE:55    ),(MARITAL:Married)}";
  String records[][] = {
    { "a" },
    { "b" },
    { "ST:NC", "ZIP:28613", "CITY:Xxxxxxx",
      "NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx" },
    { "OCCUP:xxxxxxx xxxxx", "AGE:55", "MARITAL:Married" } };
  Tuple t = mTupleFactory.newTuple(4);
  loadTuple(t, records);
  OUTPUT Out = new OUTPUT();
  DataBag bag = Out.exec(t);
  //PigUtil.printBagAsString(bag);

  Tuple [] ts = PigUtil.getTuples(bag);

  String expectedValue = "a b 55 Xxxxxxx Married Xxxxx X &xxx; Xxxxx X
Xxxxxx xxxxxxx xxxxx NC 28613";

  Assert.assertEquals(expectedValue, ts[0].get(0));
 }
 static public void loadTuple(Tuple t, String[][] input)
   throws ExecException {
  for (int i = 0; i < input.length; i++) {
   log.info("Length " + input[i].length);
   if (input[i].length == 1) {
    t.set(i, input[i][0]);
   } else if (input[i].length > 1) {
    t.set(i, loadBag(t, input[i]));
   }
  }
 }
 static public DataBag loadBag(Tuple t, String[] input) throws
ExecException {
  DataBag bag = BagFactory.getInstance().newDefaultBag();
  for (int i = 0; i < input.length; i++) {
   Tuple f = TupleFactory.getInstance().newTuple(1);
   f.set(0, input[i]);
   bag.add(f);
  }
  return bag;
 }
}
On Tue, Apr 24, 2012 at 11:38 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> I am still having difficulty converting this line from a file to tuple.
>
> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml
> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx;
> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)}
>
> I looked at:
>
>  static public Tuple loadTuple(Tuple t, String[] input) throws
> ExecException {
>         for (int i = 0; i < input.length; i++) {
>             t.set(i, input[i]);
>         }
>         return t;
>     }
>
>
> but now my question is:
> 1. how do I break it into an array of String?
> 2. Are first 2 fields also tuple?
> 3. Do I just pass the Bag in the input string?
>
> If someone could help me break down above line such that I can call
> loadTuple would be helpful. It will also help me understand what that above
> line is made up of.
>
>
>
> On Fri, Apr 20, 2012 at 9:43 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> The unit tests for TOP should be helpful?
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Apr 20, 2012, at 6:40 PM, Thejas Nair <[EMAIL PROTECTED]> wrote:
>>
>> > Though, not exactly what you are asking for - There is a
>> getTuplesFromConstantTupleStrings function in
>> test//org/apache/pig/test/Util.java that converts string representation of
>> tuples to tuple objects. It is an easier way and more maintainable way of
>> creating tuples in test cases.
>> >
>> > For example -  List<Tuple> expectedRes >> >            Util.getTuplesFromConstantTupleStrings(
>> >                    new String[] {
>> >                            "(10,20,30,40L)",
>> >                            "(11,21,31,41L)",
>> >                    });
>> >
>> > But not exposed as public interface right now. It make sense to make it
>> part of a public interface.
>> >
>> > -Thejas
>> >
>> >
>> > On 4/20/12 7:48 AM, Mohit Anchlia wrote:
>> >> Thanks for your response. Yes I am using those in my udf eval function.
>> >> Actually my quesiton was around how do I build the tuple? Is there a
>> >> utility method that would let me build my tuple with the following
>> record
>> >> type. I need to populate the tuple in below format so that I can pass