Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> String Representation of DataBag and its Schema


+
Dan DeCapria, CivicScienc... 2013-03-18, 20:18
+
Jonathan Coveney 2013-03-18, 22:31
+
Dan DeCapria, CivicScienc... 2013-03-19, 13:37
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:16
+
Jonathan Coveney 2013-03-19, 15:27
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:37
+
Jonathan Coveney 2013-03-19, 15:43
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:52
+
Jonathan Coveney 2013-03-19, 16:08
+
Dan DeCapria, CivicScienc... 2013-03-19, 16:40
+
Jonathan Coveney 2013-03-19, 16:53
+
Jonathan Coveney 2013-03-19, 16:54
+
Dan DeCapria, CivicScienc... 2013-03-19, 17:20
Copy link to this message
-
Re: String Representation of DataBag and its Schema
We managed to piece this together.  It's not fully generic (we assume a
single field).  But, it gets the job done for unit testing.
--------------
package com.civicscience.util;

import org.apache.pig.ResourceSchema;
import org.apache.pig.builtin.Utf8StorageConverter;
import org.apache.pig.impl.util.CastUtils;
import org.apache.pig.impl.util.Utils;
import org.apache.pig.newplan.logical.relational.LogicalSchema;

import java.io.IOException;

public class CSPigUtils {
    public static Object getPigRepresentation(String schema, String data)
throws IOException {
        Utf8StorageConverter caster = new Utf8StorageConverter();
        LogicalSchema ls = Utils.parseSchema(schema);
        ResourceSchema rs = new ResourceSchema(ls);
        ResourceSchema.ResourceFieldSchema[] fields = rs.getFields();
        return CastUtils.convertToType(caster, data.getBytes(), fields[0],
fields[0].getType());
    }
}
---------------
On Tue, Mar 19, 2013 at 1:20 PM, Dan DeCapria, CivicScience <
[EMAIL PROTECTED]> wrote:

> I'll give it an honest try, and any additional from the community is
> greatly appreciated!
>
> I've been on this idea for a few days now.  I even implemented my own UDF
> parser by converting the input to a char[] array and a push/popping on a
> Stack of Node Objects to generate the nested inner complex DataTypes as a
> Node tree. This worked well from a Node-linking standpoint, with a DFS
> traversal on the Node tree to rebuild the DataBag Object. But it has
> its caveats, as I have to create a UDF to generate the input for another
> input, and it assumes the fields are type safe from elements "{(})#," which
> isn't the case (ie, a serialized json chararray for a field). So I was
> hoping for a more OTS solution using existing classes and methods given the
> String and it's Schema a priori.
>
> Thank you for your help, and I'll keep this post updated on my progress
> towards a solution.
>
> -Dan
>
> On Tue, Mar 19, 2013 at 12:54 PM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > Ack, hit enter. I'd look at the LoadFunc interface, the PigSTorage class,
> > and if you can't make it work without playing a little, let me know.
> >
> >
> > 2013/3/19 Jonathan Coveney <[EMAIL PROTECTED]>
> >
> > > doing "new PigStorage()" is possible, but tricky. Maybe some of the
> other
> > > contributors have an easier way of doing this, but in the short term,
> I'd
> > > work on getting that to work. It's mainly just making sure you
> initialize
> > > it properly.
> > >
> > >
> > > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > >
> > >> This would work, but the goal would be to *not* invoke local
> interactive
> > >> pig to execute a LOAD USING PigStorage() and pass the data into the
> UDF.
> > >>  I
> > >> was hoping to keep this completely in the Java and JUnit testing
> > universe.
> > >>
> > >> Looking over the PigStorage()
> > >> doc<
> > >>
> >
> https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html
> > >> >,
> > >> would you know how to construct this process from a baseline
> PigStorage
> > >> Object, such as:
> > >>
> > >> PigStorage pigstorage = new PigStorage();
> > >>
> > >> Any ideas?
> > >>
> > >> -Dan
> > >>
> > >> On Tue, Mar 19, 2013 at 12:08 PM, Jonathan Coveney <
> [EMAIL PROTECTED]
> > >> >wrote:
> > >>
> > >> > I definitely understand the benefits, I just wanted to understand
> your
> > >> > workflow so could weigh in with what I would do.
> > >> >
> > >> > In your case, if you're going to be making these by hand, then I
> would
> > >> > mimic what PigStorage outputs, and then just load it in using
> > >> PigStorage.
> > >> >
> > >> >
> > >> > 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]
> >
> > >> >
> > >> > > By hand; creating a new JUnit method to test a specific use case
> > >> against
> > >> > a
> > >> > > functional requirement in the UDF.
> > >> > >
> > >> > > The UDFs I am testing are part of a larger ETL testing initiative
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:43
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB