Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> String Representation of DataBag and its Schema


Copy link to this message
-
Re: String Representation of DataBag and its Schema
I definitely understand the benefits, I just wanted to understand your
workflow so could weigh in with what I would do.

In your case, if you're going to be making these by hand, then I would
mimic what PigStorage outputs, and then just load it in using PigStorage.
2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>

> By hand; creating a new JUnit method to test a specific use case against a
> functional requirement in the UDF.
>
> The UDFs I am testing are part of a larger ETL testing initiative I have
> been undertaking.  To ensure that the various states of legacy data are
> correctly extracted and transformed into a Pig context, I am creating
> specific JUnit tests per each UDF containing specific use cases as testing
> methods.
>
> Motivation to use String inputs for the Data Objects and Schema Objects is
> the improvement on the conventional approach - creating Java Objects and
> adding and appending nested Objects to create the desired complex type
> DataBag Object to pass to the UDF as use case input. This simpler process
> I'm looking for should improve scale-ability and rapid-prototyping within
> the testing scripts.  It will also make the process more approachable for
> another programmer to write additional unit tests.
>
> -Dan
>
> On Tue, Mar 19, 2013 at 11:43 AM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > How are you planning on generating these cases? By hand? Or automated?
> >
> >
> > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> >
> > > String string_databag in this example was typed out by me, as the input
> > > String for a JUnit test method. I am considering generating many of
> these
> > > for case specific unit testing of my UDFs.
> > >
> > > -Dan
> > >
> > > On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > how was string_databag generated?
> > > >
> > > >
> > > > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > > >
> > > > > Expanding upon this, the following use case's Schema Object can be
> > > > resolved
> > > > > from inputs:
> > > > >
> > > > >         String string_databag = "{(a,(b,d),f)}";
> > > > >         String string_schema > > > > >
> "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
> > > > >         Schema schema = Utils.getSchemaFromString(string_schema);
> > > > >
> > > > > Next step is to resolve a DataBag Object from String string_databag
> > and
> > > > the
> > > > > Schema Object.
> > > > >
> > > > > -Dan
> > > > >
> > > > > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
> > > > > [EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > Thank you for your reply.
> > > > > >
> > > > > > The problem is I cannot find a methodology to go from a String
> > > > > > representation of a complex data type to a nested Object of pig
> > > > > DataTypes.
> > > > > > I looked over the pig 0.10.1 docs, but cannot find a way to go
> from
> > > > > String
> > > > > > and Schema to pig DataType Object.
> > > > > >
> > > > > > For context, I am generating these Strings for my own JUnit
> testing
> > > of
> > > > > > other UDFs.  Currently, for complex types, I have to generate
> each
> > > > > nesting
> > > > > > from Tuple and DataBag factories, append data, and next them
> > > manually.
> > > > >  For
> > > > > > larger unit tests, this process becomes unwieldy (hundreds of
> lines
> > > per
> > > > > > method, non-dynamic), and it would be much simpler to go directly
> > > from
> > > > a
> > > > > > String and a Schema to a DataBag Object for UDF testing (few
> lines
> > of
> > > > > code,
> > > > > > easily modifiable).
> > > > > >
> > > > > > -Dan
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > > >
> > > > > >> Why not just use PigStorage? This is essentially what it does.
> It
> > > > saves
> > > > > a
> > > > > >> bag as text, and then loads it again.