Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Unit test UDF


Copy link to this message
-
Re: Unit test UDF
I was finally able to write unit test, something like this:

It seem to work so I think the way I understood these records is probably
correct.

public class OUTPUTTest {
 private static final Logger log = Logger.getLogger(OUTPUTTest.class);
 TupleFactory mTupleFactory = TupleFactory.getInstance();
 BagFactory mBagFactory = BagFactory.getInstance();
 @Test
 public void evalFuncTest() throws IOException {
  String record = "a b
{(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
Xxxxxx)}        {(OCCUP:xxxxxxx xxxxx),(AGE:55    ),(MARITAL:Married)}";
  String records[][] = {
    { "a" },
    { "b" },
    { "ST:NC", "ZIP:28613", "CITY:Xxxxxxx",
      "NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx" },
    { "OCCUP:xxxxxxx xxxxx", "AGE:55", "MARITAL:Married" } };
  Tuple t = mTupleFactory.newTuple(4);
  loadTuple(t, records);
  OUTPUT Out = new OUTPUT();
  DataBag bag = Out.exec(t);
  //PigUtil.printBagAsString(bag);

  Tuple [] ts = PigUtil.getTuples(bag);

  String expectedValue = "a b 55 Xxxxxxx Married Xxxxx X &xxx; Xxxxx X
Xxxxxx xxxxxxx xxxxx NC 28613";

  Assert.assertEquals(expectedValue, ts[0].get(0));
 }
 static public void loadTuple(Tuple t, String[][] input)
   throws ExecException {
  for (int i = 0; i < input.length; i++) {
   log.info("Length " + input[i].length);
   if (input[i].length == 1) {
    t.set(i, input[i][0]);
   } else if (input[i].length > 1) {
    t.set(i, loadBag(t, input[i]));
   }
  }
 }
 static public DataBag loadBag(Tuple t, String[] input) throws
ExecException {
  DataBag bag = BagFactory.getInstance().newDefaultBag();
  for (int i = 0; i < input.length; i++) {
   Tuple f = TupleFactory.getInstance().newTuple(1);
   f.set(0, input[i]);
   bag.add(f);
  }
  return bag;
 }
}
On Tue, Apr 24, 2012 at 11:38 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> I am still having difficulty converting this line from a file to tuple.
>
> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml
> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx;
> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)}
>
> I looked at:
>
>  static public Tuple loadTuple(Tuple t, String[] input) throws
> ExecException {
>         for (int i = 0; i < input.length; i++) {
>             t.set(i, input[i]);
>         }
>         return t;
>     }
>
>
> but now my question is:
> 1. how do I break it into an array of String?
> 2. Are first 2 fields also tuple?
> 3. Do I just pass the Bag in the input string?
>
> If someone could help me break down above line such that I can call
> loadTuple would be helpful. It will also help me understand what that above
> line is made up of.
>
>
>
> On Fri, Apr 20, 2012 at 9:43 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> The unit tests for TOP should be helpful?
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Apr 20, 2012, at 6:40 PM, Thejas Nair <[EMAIL PROTECTED]> wrote:
>>
>> > Though, not exactly what you are asking for - There is a
>> getTuplesFromConstantTupleStrings function in
>> test//org/apache/pig/test/Util.java that converts string representation of
>> tuples to tuple objects. It is an easier way and more maintainable way of
>> creating tuples in test cases.
>> >
>> > For example -  List<Tuple> expectedRes >> >            Util.getTuplesFromConstantTupleStrings(
>> >                    new String[] {
>> >                            "(10,20,30,40L)",
>> >                            "(11,21,31,41L)",
>> >                    });
>> >
>> > But not exposed as public interface right now. It make sense to make it
>> part of a public interface.
>> >
>> > -Thejas
>> >
>> >
>> > On 4/20/12 7:48 AM, Mohit Anchlia wrote:
>> >> Thanks for your response. Yes I am using those in my udf eval function.
>> >> Actually my quesiton was around how do I build the tuple? Is there a
>> >> utility method that would let me build my tuple with the following
>> record
>> >> type. I need to populate the tuple in below format so that I can pass
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB