|
|
Mohit Anchlia 2012-04-20, 00:05
Is there a way I can just unit test my pig UDF? What's the best way to unit test in pig. I saw pigunittest but couldn't find a way to unit test udf.
Dmitriy Ryaboy 2012-04-20, 00:16
Hi Mohit, We just write standard Java unit tests for pig UDFs. You can see a ton of them here: https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.javaDoes that help? D On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > Is there a way I can just unit test my pig UDF? What's the best way to unit > test in pig. I saw pigunittest but couldn't find a way to unit test udf.
Mohit Anchlia 2012-04-20, 00:51
Thanks! I am trying to figure out how to create a Tuble object that also has bags in it. I have a record like this that I want to pass to UDF as a tuple. Any info would be very helpful. 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} On Thu, Apr 19, 2012 at 5:16 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Hi Mohit, > We just write standard Java unit tests for pig UDFs. You can see a ton > of them here: > https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.java> > Does that help? > > D > > On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > Is there a way I can just unit test my pig UDF? What's the best way to > unit > > test in pig. I saw pigunittest but couldn't find a way to unit test udf. >
Russell Jurney 2012-04-20, 01:43
DefaultTupleFactory and DefaultDataBagFactory. Just add the bag as a field of the tuple, and so forth. Russell Jurney http://datasyndrome.comOn Apr 19, 2012, at 5:51 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > Thanks! I am trying to figure out how to create a Tuble object that also > has bags in it. I have a record like this that I want to pass to UDF as a > tuple. Any info would be very helpful. > > > 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml > 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; > Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} > > > On Thu, Apr 19, 2012 at 5:16 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > >> Hi Mohit, >> We just write standard Java unit tests for pig UDFs. You can see a ton >> of them here: >> https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.java>> >> Does that help? >> >> D >> >> On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote: >>> Is there a way I can just unit test my pig UDF? What's the best way to >> unit >>> test in pig. I saw pigunittest but couldn't find a way to unit test udf. >>
Dmitriy Ryaboy 2012-04-20, 01:44
Something like this (not tested): List<Tuple> bagtuples = Lists.newArrayList(); // populate inner tuples, then... DataBag myBag = BagFactory.getInstance().newBag(bagtuples); Tuple t = TupleFactory.getInstance().newTuple(myBag); D On Thu, Apr 19, 2012 at 5:51 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > Thanks! I am trying to figure out how to create a Tuble object that also > has bags in it. I have a record like this that I want to pass to UDF as a > tuple. Any info would be very helpful. > > > 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml > 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; > Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} > > > On Thu, Apr 19, 2012 at 5:16 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > >> Hi Mohit, >> We just write standard Java unit tests for pig UDFs. You can see a ton >> of them here: >> https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.java>> >> Does that help? >> >> D >> >> On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote: >> > Is there a way I can just unit test my pig UDF? What's the best way to >> unit >> > test in pig. I saw pigunittest but couldn't find a way to unit test udf. >>
Mohit Anchlia 2012-04-20, 14:48
Thanks for your response. Yes I am using those in my udf eval function. Actually my quesiton was around how do I build the tuple? Is there a utility method that would let me build my tuple with the following record type. I need to populate the tuple in below format so that I can pass it in the unit test. It's tab delimited and also has bags. 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married On Thu, Apr 19, 2012 at 6:44 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Something like this (not tested): > > List<Tuple> bagtuples = Lists.newArrayList(); > > // populate inner tuples, then... > > DataBag myBag = BagFactory.getInstance().newBag(bagtuples); > Tuple t = TupleFactory.getInstance().newTuple(myBag); > > D > > > On Thu, Apr 19, 2012 at 5:51 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > Thanks! I am trying to figure out how to create a Tuble object that also > > has bags in it. I have a record like this that I want to pass to UDF as a > > tuple. Any info would be very helpful. > > > > > > 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml > > 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; > > Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} > > > > > > On Thu, Apr 19, 2012 at 5:16 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > >> Hi Mohit, > >> We just write standard Java unit tests for pig UDFs. You can see a ton > >> of them here: > >> > https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.java> >> > >> Does that help? > >> > >> D > >> > >> On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia <[EMAIL PROTECTED]> > >> wrote: > >> > Is there a way I can just unit test my pig UDF? What's the best way to > >> unit > >> > test in pig. I saw pigunittest but couldn't find a way to unit test > udf. > >> >
Thejas Nair 2012-04-21, 01:40
Though, not exactly what you are asking for - There is a getTuplesFromConstantTupleStrings function in test//org/apache/pig/test/Util.java that converts string representation of tuples to tuple objects. It is an easier way and more maintainable way of creating tuples in test cases. For example - List<Tuple> expectedRes Util.getTuplesFromConstantTupleStrings( new String[] { "(10,20,30,40L)", "(11,21,31,41L)", }); But not exposed as public interface right now. It make sense to make it part of a public interface. -Thejas On 4/20/12 7:48 AM, Mohit Anchlia wrote: > Thanks for your response. Yes I am using those in my udf eval function. > Actually my quesiton was around how do I build the tuple? Is there a > utility method that would let me build my tuple with the following record > type. I need to populate the tuple in below format so that I can pass it in > the unit test. It's tab delimited and also has bags. > > 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml > 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X&xxx; > Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married > > On Thu, Apr 19, 2012 at 6:44 PM, Dmitriy Ryaboy<[EMAIL PROTECTED]> wrote: > >> Something like this (not tested): >> >> List<Tuple> bagtuples = Lists.newArrayList(); >> >> // populate inner tuples, then... >> >> DataBag myBag = BagFactory.getInstance().newBag(bagtuples); >> Tuple t = TupleFactory.getInstance().newTuple(myBag); >> >> D >> >> >> On Thu, Apr 19, 2012 at 5:51 PM, Mohit Anchlia<[EMAIL PROTECTED]> >> wrote: >>> Thanks! I am trying to figure out how to create a Tuble object that also >>> has bags in it. I have a record like this that I want to pass to UDF as a >>> tuple. Any info would be very helpful. >>> >>> >>> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml >>> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X&xxx; >>> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} >>> >>> >>> On Thu, Apr 19, 2012 at 5:16 PM, Dmitriy Ryaboy<[EMAIL PROTECTED]> >> wrote: >>> >>>> Hi Mohit, >>>> We just write standard Java unit tests for pig UDFs. You can see a ton >>>> of them here: >>>> >> https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.java>>>> >>>> Does that help? >>>> >>>> D >>>> >>>> On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia<[EMAIL PROTECTED]> >>>> wrote: >>>>> Is there a way I can just unit test my pig UDF? What's the best way to >>>> unit >>>>> test in pig. I saw pigunittest but couldn't find a way to unit test >> udf. >>>> >> >
Russell Jurney 2012-04-21, 04:43
The unit tests for TOP should be helpful? Russell Jurney http://datasyndrome.comOn Apr 20, 2012, at 6:40 PM, Thejas Nair <[EMAIL PROTECTED]> wrote: > Though, not exactly what you are asking for - There is a getTuplesFromConstantTupleStrings function in test//org/apache/pig/test/Util.java that converts string representation of tuples to tuple objects. It is an easier way and more maintainable way of creating tuples in test cases. > > For example - List<Tuple> expectedRes > Util.getTuplesFromConstantTupleStrings( > new String[] { > "(10,20,30,40L)", > "(11,21,31,41L)", > }); > > But not exposed as public interface right now. It make sense to make it part of a public interface. > > -Thejas > > > On 4/20/12 7:48 AM, Mohit Anchlia wrote: >> Thanks for your response. Yes I am using those in my udf eval function. >> Actually my quesiton was around how do I build the tuple? Is there a >> utility method that would let me build my tuple with the following record >> type. I need to populate the tuple in below format so that I can pass it in >> the unit test. It's tab delimited and also has bags. >> >> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml >> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X&xxx; >> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married >> >> On Thu, Apr 19, 2012 at 6:44 PM, Dmitriy Ryaboy<[EMAIL PROTECTED]> wrote: >> >>> Something like this (not tested): >>> >>> List<Tuple> bagtuples = Lists.newArrayList(); >>> >>> // populate inner tuples, then... >>> >>> DataBag myBag = BagFactory.getInstance().newBag(bagtuples); >>> Tuple t = TupleFactory.getInstance().newTuple(myBag); >>> >>> D >>> >>> >>> On Thu, Apr 19, 2012 at 5:51 PM, Mohit Anchlia<[EMAIL PROTECTED]> >>> wrote: >>>> Thanks! I am trying to figure out how to create a Tuble object that also >>>> has bags in it. I have a record like this that I want to pass to UDF as a >>>> tuple. Any info would be very helpful. >>>> >>>> >>>> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml >>>> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X&xxx; >>>> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} >>>> >>>> >>>> On Thu, Apr 19, 2012 at 5:16 PM, Dmitriy Ryaboy<[EMAIL PROTECTED]> >>> wrote: >>>> >>>>> Hi Mohit, >>>>> We just write standard Java unit tests for pig UDFs. You can see a ton >>>>> of them here: >>>>> >>> https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.java>>>>> >>>>> Does that help? >>>>> >>>>> D >>>>> >>>>> On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia<[EMAIL PROTECTED]> >>>>> wrote: >>>>>> Is there a way I can just unit test my pig UDF? What's the best way to >>>>> unit >>>>>> test in pig. I saw pigunittest but couldn't find a way to unit test >>> udf. >>>>> >>> >> >
Mohit Anchlia 2012-04-24, 18:38
I am still having difficulty converting this line from a file to tuple. 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} I looked at: static public Tuple loadTuple(Tuple t, String[] input) throws ExecException { for (int i = 0; i < input.length; i++) { t.set(i, input[i]); } return t; } but now my question is: 1. how do I break it into an array of String? 2. Are first 2 fields also tuple? 3. Do I just pass the Bag in the input string? If someone could help me break down above line such that I can call loadTuple would be helpful. It will also help me understand what that above line is made up of. On Fri, Apr 20, 2012 at 9:43 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > The unit tests for TOP should be helpful? > > Russell Jurney http://datasyndrome.com> > On Apr 20, 2012, at 6:40 PM, Thejas Nair <[EMAIL PROTECTED]> wrote: > > > Though, not exactly what you are asking for - There is a > getTuplesFromConstantTupleStrings function in > test//org/apache/pig/test/Util.java that converts string representation of > tuples to tuple objects. It is an easier way and more maintainable way of > creating tuples in test cases. > > > > For example - List<Tuple> expectedRes > > Util.getTuplesFromConstantTupleStrings( > > new String[] { > > "(10,20,30,40L)", > > "(11,21,31,41L)", > > }); > > > > But not exposed as public interface right now. It make sense to make it > part of a public interface. > > > > -Thejas > > > > > > On 4/20/12 7:48 AM, Mohit Anchlia wrote: > >> Thanks for your response. Yes I am using those in my udf eval function. > >> Actually my quesiton was around how do I build the tuple? Is there a > >> utility method that would let me build my tuple with the following > record > >> type. I need to populate the tuple in below format so that I can pass > it in > >> the unit test. It's tab delimited and also has bags. > >> > >> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml > >> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X&xxx; > >> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married > >> > >> On Thu, Apr 19, 2012 at 6:44 PM, Dmitriy Ryaboy<[EMAIL PROTECTED]> > wrote: > >> > >>> Something like this (not tested): > >>> > >>> List<Tuple> bagtuples = Lists.newArrayList(); > >>> > >>> // populate inner tuples, then... > >>> > >>> DataBag myBag = BagFactory.getInstance().newBag(bagtuples); > >>> Tuple t = TupleFactory.getInstance().newTuple(myBag); > >>> > >>> D > >>> > >>> > >>> On Thu, Apr 19, 2012 at 5:51 PM, Mohit Anchlia<[EMAIL PROTECTED]> > >>> wrote: > >>>> Thanks! I am trying to figure out how to create a Tuble object that > also > >>>> has bags in it. I have a record like this that I want to pass to UDF > as a > >>>> tuple. Any info would be very helpful. > >>>> > >>>> > >>>> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml > >>>> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx > X&xxx; > >>>> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} > >>>> > >>>> > >>>> On Thu, Apr 19, 2012 at 5:16 PM, Dmitriy Ryaboy<[EMAIL PROTECTED]> > >>> wrote: > >>>> > >>>>> Hi Mohit, > >>>>> We just write standard Java unit tests for pig UDFs. You can see a > ton > >>>>> of them here: > >>>>> > >>> > https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.java> >>>>> > >>>>> Does that help? > >>>>> > >>>>> D > >>>>> > >>>>> On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia< > [EMAIL PROTECTED]> > >>>>> wrote: > >>>>>> Is there a way I can just unit test my pig UDF? What's the best way > to > >>>>> unit > >>>>>> test in pig. I saw pigunittest but couldn't find a way to unit test
Mohit Anchlia 2012-04-24, 22:46
I was finally able to write unit test, something like this: It seem to work so I think the way I understood these records is probably correct. public class OUTPUTTest { private static final Logger log = Logger.getLogger(OUTPUTTest.class); TupleFactory mTupleFactory = TupleFactory.getInstance(); BagFactory mBagFactory = BagFactory.getInstance(); @Test public void evalFuncTest() throws IOException { String record = "a b {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}"; String records[][] = { { "a" }, { "b" }, { "ST:NC", "ZIP:28613", "CITY:Xxxxxxx", "NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx" }, { "OCCUP:xxxxxxx xxxxx", "AGE:55", "MARITAL:Married" } }; Tuple t = mTupleFactory.newTuple(4); loadTuple(t, records); OUTPUT Out = new OUTPUT(); DataBag bag = Out.exec(t); //PigUtil.printBagAsString(bag); Tuple [] ts = PigUtil.getTuples(bag); String expectedValue = "a b 55 Xxxxxxx Married Xxxxx X &xxx; Xxxxx X Xxxxxx xxxxxxx xxxxx NC 28613"; Assert.assertEquals(expectedValue, ts[0].get(0)); } static public void loadTuple(Tuple t, String[][] input) throws ExecException { for (int i = 0; i < input.length; i++) { log.info("Length " + input[i].length); if (input[i].length == 1) { t.set(i, input[i][0]); } else if (input[i].length > 1) { t.set(i, loadBag(t, input[i])); } } } static public DataBag loadBag(Tuple t, String[] input) throws ExecException { DataBag bag = BagFactory.getInstance().newDefaultBag(); for (int i = 0; i < input.length; i++) { Tuple f = TupleFactory.getInstance().newTuple(1); f.set(0, input[i]); bag.add(f); } return bag; } } On Tue, Apr 24, 2012 at 11:38 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > I am still having difficulty converting this line from a file to tuple. > > 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml > 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; > Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)} > > I looked at: > > static public Tuple loadTuple(Tuple t, String[] input) throws > ExecException { > for (int i = 0; i < input.length; i++) { > t.set(i, input[i]); > } > return t; > } > > > but now my question is: > 1. how do I break it into an array of String? > 2. Are first 2 fields also tuple? > 3. Do I just pass the Bag in the input string? > > If someone could help me break down above line such that I can call > loadTuple would be helpful. It will also help me understand what that above > line is made up of. > > > > On Fri, Apr 20, 2012 at 9:43 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > >> The unit tests for TOP should be helpful? >> >> Russell Jurney http://datasyndrome.com>> >> On Apr 20, 2012, at 6:40 PM, Thejas Nair <[EMAIL PROTECTED]> wrote: >> >> > Though, not exactly what you are asking for - There is a >> getTuplesFromConstantTupleStrings function in >> test//org/apache/pig/test/Util.java that converts string representation of >> tuples to tuple objects. It is an easier way and more maintainable way of >> creating tuples in test cases. >> > >> > For example - List<Tuple> expectedRes >> > Util.getTuplesFromConstantTupleStrings( >> > new String[] { >> > "(10,20,30,40L)", >> > "(11,21,31,41L)", >> > }); >> > >> > But not exposed as public interface right now. It make sense to make it >> part of a public interface. >> > >> > -Thejas >> > >> > >> > On 4/20/12 7:48 AM, Mohit Anchlia wrote: >> >> Thanks for your response. Yes I am using those in my udf eval function. >> >> Actually my quesiton was around how do I build the tuple? Is there a >> >> utility method that would let me build my tuple with the following >> record >> >> type. I need to populate the tuple in below format so that I can pass
|
|