Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> String Representation of DataBag and its Schema


Copy link to this message
-
Re: String Representation of DataBag and its Schema
We managed to piece this together.  It's not fully generic (we assume a
single field).  But, it gets the job done for unit testing.
--------------
package com.civicscience.util;

import org.apache.pig.ResourceSchema;
import org.apache.pig.builtin.Utf8StorageConverter;
import org.apache.pig.impl.util.CastUtils;
import org.apache.pig.impl.util.Utils;
import org.apache.pig.newplan.logical.relational.LogicalSchema;

import java.io.IOException;

public class CSPigUtils {
    public static Object getPigRepresentation(String schema, String data)
throws IOException {
        Utf8StorageConverter caster = new Utf8StorageConverter();
        LogicalSchema ls = Utils.parseSchema(schema);
        ResourceSchema rs = new ResourceSchema(ls);
        ResourceSchema.ResourceFieldSchema[] fields = rs.getFields();
        return CastUtils.convertToType(caster, data.getBytes(), fields[0],
fields[0].getType());
    }
}
---------------
On Tue, Mar 19, 2013 at 1:20 PM, Dan DeCapria, CivicScience <
[EMAIL PROTECTED]> wrote:

> I'll give it an honest try, and any additional from the community is
> greatly appreciated!
>
> I've been on this idea for a few days now.  I even implemented my own UDF
> parser by converting the input to a char[] array and a push/popping on a
> Stack of Node Objects to generate the nested inner complex DataTypes as a
> Node tree. This worked well from a Node-linking standpoint, with a DFS
> traversal on the Node tree to rebuild the DataBag Object. But it has
> its caveats, as I have to create a UDF to generate the input for another
> input, and it assumes the fields are type safe from elements "{(})#," which
> isn't the case (ie, a serialized json chararray for a field). So I was
> hoping for a more OTS solution using existing classes and methods given the
> String and it's Schema a priori.
>
> Thank you for your help, and I'll keep this post updated on my progress
> towards a solution.
>
> -Dan
>
> On Tue, Mar 19, 2013 at 12:54 PM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > Ack, hit enter. I'd look at the LoadFunc interface, the PigSTorage class,
> > and if you can't make it work without playing a little, let me know.
> >
> >
> > 2013/3/19 Jonathan Coveney <[EMAIL PROTECTED]>
> >
> > > doing "new PigStorage()" is possible, but tricky. Maybe some of the
> other
> > > contributors have an easier way of doing this, but in the short term,
> I'd
> > > work on getting that to work. It's mainly just making sure you
> initialize
> > > it properly.
> > >
> > >
> > > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > >
> > >> This would work, but the goal would be to *not* invoke local
> interactive
> > >> pig to execute a LOAD USING PigStorage() and pass the data into the
> UDF.
> > >>  I
> > >> was hoping to keep this completely in the Java and JUnit testing
> > universe.
> > >>
> > >> Looking over the PigStorage()
> > >> doc<
> > >>
> >
> https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html
> > >> >,
> > >> would you know how to construct this process from a baseline
> PigStorage
> > >> Object, such as:
> > >>
> > >> PigStorage pigstorage = new PigStorage();
> > >>
> > >> Any ideas?
> > >>
> > >> -Dan
> > >>
> > >> On Tue, Mar 19, 2013 at 12:08 PM, Jonathan Coveney <
> [EMAIL PROTECTED]
> > >> >wrote:
> > >>
> > >> > I definitely understand the benefits, I just wanted to understand
> your
> > >> > workflow so could weigh in with what I would do.
> > >> >
> > >> > In your case, if you're going to be making these by hand, then I
> would
> > >> > mimic what PigStorage outputs, and then just load it in using
> > >> PigStorage.
> > >> >
> > >> >
> > >> > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]
> >
> > >> >
> > >> > > By hand; creating a new JUnit method to test a specific use case
> > >> against
> > >> > a
> > >> > > functional requirement in the UDF.
> > >> > >
> > >> > > The UDFs I am testing are part of a larger ETL testing initiative