Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> String Representation of DataBag and its Schema


+
Dan DeCapria, CivicScienc... 2013-03-18, 20:18
+
Jonathan Coveney 2013-03-18, 22:31
+
Dan DeCapria, CivicScienc... 2013-03-19, 13:37
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:16
+
Jonathan Coveney 2013-03-19, 15:27
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:37
+
Jonathan Coveney 2013-03-19, 15:43
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:52
+
Jonathan Coveney 2013-03-19, 16:08
Copy link to this message
-
Re: String Representation of DataBag and its Schema
This would work, but the goal would be to *not* invoke local interactive
pig to execute a LOAD USING PigStorage() and pass the data into the UDF.  I
was hoping to keep this completely in the Java and JUnit testing universe.

Looking over the PigStorage()
doc<https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html>,
would you know how to construct this process from a baseline PigStorage
Object, such as:

PigStorage pigstorage = new PigStorage();

Any ideas?

-Dan

On Tue, Mar 19, 2013 at 12:08 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> I definitely understand the benefits, I just wanted to understand your
> workflow so could weigh in with what I would do.
>
> In your case, if you're going to be making these by hand, then I would
> mimic what PigStorage outputs, and then just load it in using PigStorage.
>
>
> 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>
> > By hand; creating a new JUnit method to test a specific use case against
> a
> > functional requirement in the UDF.
> >
> > The UDFs I am testing are part of a larger ETL testing initiative I have
> > been undertaking.  To ensure that the various states of legacy data are
> > correctly extracted and transformed into a Pig context, I am creating
> > specific JUnit tests per each UDF containing specific use cases as
> testing
> > methods.
> >
> > Motivation to use String inputs for the Data Objects and Schema Objects
> is
> > the improvement on the conventional approach - creating Java Objects and
> > adding and appending nested Objects to create the desired complex type
> > DataBag Object to pass to the UDF as use case input. This simpler process
> > I'm looking for should improve scale-ability and rapid-prototyping within
> > the testing scripts.  It will also make the process more approachable for
> > another programmer to write additional unit tests.
> >
> > -Dan
> >
> > On Tue, Mar 19, 2013 at 11:43 AM, Jonathan Coveney <[EMAIL PROTECTED]
> > >wrote:
> >
> > > How are you planning on generating these cases? By hand? Or automated?
> > >
> > >
> > > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > >
> > > > String string_databag in this example was typed out by me, as the
> input
> > > > String for a JUnit test method. I am considering generating many of
> > these
> > > > for case specific unit testing of my UDFs.
> > > >
> > > > -Dan
> > > >
> > > > On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > how was string_databag generated?
> > > > >
> > > > >
> > > > > 2013/3/19 Dan DeCapria, CivicScience <
> [EMAIL PROTECTED]>
> > > > >
> > > > > > Expanding upon this, the following use case's Schema Object can
> be
> > > > > resolved
> > > > > > from inputs:
> > > > > >
> > > > > >         String string_databag = "{(a,(b,d),f)}";
> > > > > >         String string_schema > > > > > >
> > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
> > > > > >         Schema schema = Utils.getSchemaFromString(string_schema);
> > > > > >
> > > > > > Next step is to resolve a DataBag Object from String
> string_databag
> > > and
> > > > > the
> > > > > > Schema Object.
> > > > > >
> > > > > > -Dan
> > > > > >
> > > > > > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
> > > > > > [EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > > Thank you for your reply.
> > > > > > >
> > > > > > > The problem is I cannot find a methodology to go from a String
> > > > > > > representation of a complex data type to a nested Object of pig
> > > > > > DataTypes.
> > > > > > > I looked over the pig 0.10.1 docs, but cannot find a way to go
> > from
> > > > > > String
> > > > > > > and Schema to pig DataType Object.
> > > > > > >
> > > > > > > For context, I am generating these Strings for my own JUnit
> > testing
> > > > of
> > > > > > > other UDFs.  Currently, for complex types, I have to generate
> > each
> > > > > > nesting

Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist
+
Jonathan Coveney 2013-03-19, 16:53
+
Jonathan Coveney 2013-03-19, 16:54
+
Dan DeCapria, CivicScienc... 2013-03-19, 17:20
+
William Oberman 2013-03-21, 15:51
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:43