Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> String Representation of DataBag and its Schema


+
Dan DeCapria, CivicScienc... 2013-03-18, 20:18
+
Jonathan Coveney 2013-03-18, 22:31
+
Dan DeCapria, CivicScienc... 2013-03-19, 13:37
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:16
+
Jonathan Coveney 2013-03-19, 15:27
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:37
+
Jonathan Coveney 2013-03-19, 15:43
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:52
+
Jonathan Coveney 2013-03-19, 16:08
+
Dan DeCapria, CivicScienc... 2013-03-19, 16:40
+
Jonathan Coveney 2013-03-19, 16:53
Copy link to this message
-
Re: String Representation of DataBag and its Schema
Ack, hit enter. I'd look at the LoadFunc interface, the PigSTorage class,
and if you can't make it work without playing a little, let me know.
2013/3/19 Jonathan Coveney <[EMAIL PROTECTED]>

> doing "new PigStorage()" is possible, but tricky. Maybe some of the other
> contributors have an easier way of doing this, but in the short term, I'd
> work on getting that to work. It's mainly just making sure you initialize
> it properly.
>
>
> 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>
>> This would work, but the goal would be to *not* invoke local interactive
>> pig to execute a LOAD USING PigStorage() and pass the data into the UDF.
>>  I
>> was hoping to keep this completely in the Java and JUnit testing universe.
>>
>> Looking over the PigStorage()
>> doc<
>> https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html
>> >,
>> would you know how to construct this process from a baseline PigStorage
>> Object, such as:
>>
>> PigStorage pigstorage = new PigStorage();
>>
>> Any ideas?
>>
>> -Dan
>>
>> On Tue, Mar 19, 2013 at 12:08 PM, Jonathan Coveney <[EMAIL PROTECTED]
>> >wrote:
>>
>> > I definitely understand the benefits, I just wanted to understand your
>> > workflow so could weigh in with what I would do.
>> >
>> > In your case, if you're going to be making these by hand, then I would
>> > mimic what PigStorage outputs, and then just load it in using
>> PigStorage.
>> >
>> >
>> > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>> >
>> > > By hand; creating a new JUnit method to test a specific use case
>> against
>> > a
>> > > functional requirement in the UDF.
>> > >
>> > > The UDFs I am testing are part of a larger ETL testing initiative I
>> have
>> > > been undertaking.  To ensure that the various states of legacy data
>> are
>> > > correctly extracted and transformed into a Pig context, I am creating
>> > > specific JUnit tests per each UDF containing specific use cases as
>> > testing
>> > > methods.
>> > >
>> > > Motivation to use String inputs for the Data Objects and Schema
>> Objects
>> > is
>> > > the improvement on the conventional approach - creating Java Objects
>> and
>> > > adding and appending nested Objects to create the desired complex type
>> > > DataBag Object to pass to the UDF as use case input. This simpler
>> process
>> > > I'm looking for should improve scale-ability and rapid-prototyping
>> within
>> > > the testing scripts.  It will also make the process more approachable
>> for
>> > > another programmer to write additional unit tests.
>> > >
>> > > -Dan
>> > >
>> > > On Tue, Mar 19, 2013 at 11:43 AM, Jonathan Coveney <
>> [EMAIL PROTECTED]
>> > > >wrote:
>> > >
>> > > > How are you planning on generating these cases? By hand? Or
>> automated?
>> > > >
>> > > >
>> > > > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]
>> >
>> > > >
>> > > > > String string_databag in this example was typed out by me, as the
>> > input
>> > > > > String for a JUnit test method. I am considering generating many
>> of
>> > > these
>> > > > > for case specific unit testing of my UDFs.
>> > > > >
>> > > > > -Dan
>> > > > >
>> > > > > On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <
>> > [EMAIL PROTECTED]
>> > > > > >wrote:
>> > > > >
>> > > > > > how was string_databag generated?
>> > > > > >
>> > > > > >
>> > > > > > 2013/3/19 Dan DeCapria, CivicScience <
>> > [EMAIL PROTECTED]>
>> > > > > >
>> > > > > > > Expanding upon this, the following use case's Schema Object
>> can
>> > be
>> > > > > > resolved
>> > > > > > > from inputs:
>> > > > > > >
>> > > > > > >         String string_databag = "{(a,(b,d),f)}";
>> > > > > > >         String string_schema >> > > > > > >
>> > > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
>> > > > > > >         Schema schema >> Utils.getSchemaFromString(string_schema);
>> > > > > > >
>> > > > > > > Next step is to resolve a DataBag Object from String
>> > string_databag
+
Dan DeCapria, CivicScienc... 2013-03-19, 17:20
+
William Oberman 2013-03-21, 15:51
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:43