Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> String Representation of DataBag and its Schema


Copy link to this message
-
Re: String Representation of DataBag and its Schema
I definitely understand the benefits, I just wanted to understand your
workflow so could weigh in with what I would do.

In your case, if you're going to be making these by hand, then I would
mimic what PigStorage outputs, and then just load it in using PigStorage.
2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>

> By hand; creating a new JUnit method to test a specific use case against a
> functional requirement in the UDF.
>
> The UDFs I am testing are part of a larger ETL testing initiative I have
> been undertaking.  To ensure that the various states of legacy data are
> correctly extracted and transformed into a Pig context, I am creating
> specific JUnit tests per each UDF containing specific use cases as testing
> methods.
>
> Motivation to use String inputs for the Data Objects and Schema Objects is
> the improvement on the conventional approach - creating Java Objects and
> adding and appending nested Objects to create the desired complex type
> DataBag Object to pass to the UDF as use case input. This simpler process
> I'm looking for should improve scale-ability and rapid-prototyping within
> the testing scripts.  It will also make the process more approachable for
> another programmer to write additional unit tests.
>
> -Dan
>
> On Tue, Mar 19, 2013 at 11:43 AM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > How are you planning on generating these cases? By hand? Or automated?
> >
> >
> > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> >
> > > String string_databag in this example was typed out by me, as the input
> > > String for a JUnit test method. I am considering generating many of
> these
> > > for case specific unit testing of my UDFs.
> > >
> > > -Dan
> > >
> > > On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > how was string_databag generated?
> > > >
> > > >
> > > > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > > >
> > > > > Expanding upon this, the following use case's Schema Object can be
> > > > resolved
> > > > > from inputs:
> > > > >
> > > > >         String string_databag = "{(a,(b,d),f)}";
> > > > >         String string_schema > > > > >
> "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
> > > > >         Schema schema = Utils.getSchemaFromString(string_schema);
> > > > >
> > > > > Next step is to resolve a DataBag Object from String string_databag
> > and
> > > > the
> > > > > Schema Object.
> > > > >
> > > > > -Dan
> > > > >
> > > > > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
> > > > > [EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > Thank you for your reply.
> > > > > >
> > > > > > The problem is I cannot find a methodology to go from a String
> > > > > > representation of a complex data type to a nested Object of pig
> > > > > DataTypes.
> > > > > > I looked over the pig 0.10.1 docs, but cannot find a way to go
> from
> > > > > String
> > > > > > and Schema to pig DataType Object.
> > > > > >
> > > > > > For context, I am generating these Strings for my own JUnit
> testing
> > > of
> > > > > > other UDFs.  Currently, for complex types, I have to generate
> each
> > > > > nesting
> > > > > > from Tuple and DataBag factories, append data, and next them
> > > manually.
> > > > >  For
> > > > > > larger unit tests, this process becomes unwieldy (hundreds of
> lines
> > > per
> > > > > > method, non-dynamic), and it would be much simpler to go directly
> > > from
> > > > a
> > > > > > String and a Schema to a DataBag Object for UDF testing (few
> lines
> > of
> > > > > code,
> > > > > > easily modifiable).
> > > > > >
> > > > > > -Dan
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > > >
> > > > > >> Why not just use PigStorage? This is essentially what it does.
> It
> > > > saves
> > > > > a
> > > > > >> bag as text, and then loads it again.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB