Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> String Representation of DataBag and its Schema


Copy link to this message
-
Re: String Representation of DataBag and its Schema
String string_databag in this example was typed out by me, as the input
String for a JUnit test method. I am considering generating many of these
for case specific unit testing of my UDFs.

-Dan

On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> how was string_databag generated?
>
>
> 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>
> > Expanding upon this, the following use case's Schema Object can be
> resolved
> > from inputs:
> >
> >         String string_databag = "{(a,(b,d),f)}";
> >         String string_schema > > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
> >         Schema schema = Utils.getSchemaFromString(string_schema);
> >
> > Next step is to resolve a DataBag Object from String string_databag and
> the
> > Schema Object.
> >
> > -Dan
> >
> > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Thank you for your reply.
> > >
> > > The problem is I cannot find a methodology to go from a String
> > > representation of a complex data type to a nested Object of pig
> > DataTypes.
> > > I looked over the pig 0.10.1 docs, but cannot find a way to go from
> > String
> > > and Schema to pig DataType Object.
> > >
> > > For context, I am generating these Strings for my own JUnit testing of
> > > other UDFs.  Currently, for complex types, I have to generate each
> > nesting
> > > from Tuple and DataBag factories, append data, and next them manually.
> >  For
> > > larger unit tests, this process becomes unwieldy (hundreds of lines per
> > > method, non-dynamic), and it would be much simpler to go directly from
> a
> > > String and a Schema to a DataBag Object for UDF testing (few lines of
> > code,
> > > easily modifiable).
> > >
> > > -Dan
> > >
> > >
> > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <[EMAIL PROTECTED]
> > >wrote:
> > >
> > >> Why not just use PigStorage? This is essentially what it does. It
> saves
> > a
> > >> bag as text, and then loads it again.
> > >>
> > >> I suppose the question becomes: why do you need to do this?
> > >>
> > >>
> > >> 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > >>
> > >> > In Java, I am trying to convert a DataBag from it's String
> > >> representation
> > >> > with its schema String to a valid DataBag Object:
> > >> >
> > >> > String databag_string = "{(apples,1024)}";
> > >> > String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}";
> > >> >
> > >> > I've tried implementing something along the lines of this, but I
> > believe
> > >> > it's in the wrong direction, and then I get stuck:
> > >> >
> > >> >         String[] aliases = {"b1", "t1", "a", "b"};
> > >> >         byte[] types = {DataType.BAG, DataType.TUPLE,
> > >> DataType.CHARARRAY,
> > >> > DataType.LONG};
> > >> >         List<Schema.FieldSchema> fsList = new
> > >> > ArrayList<Schema.FieldSchema>();
> > >> >         for (int i = 0; i < aliases.length; i++) {
> > >> >             fsList.add(new Schema.FieldSchema(aliases[i],
> types[i])) ;
> > >> >         }
> > >> >         Schema origSchema = new Schema(fsList);
> > >> >         ResourceSchema rsSchema = new ResourceSchema(origSchema);
> > >> >         Schema genSchema = Schema.getPigSchema(rsSchema);
> > >> >         ResourceSchema.ResourceFieldSchema[] rfschema > > >> > rsSchema.getFields();
> > >> >         ... lost here, maybe Utf8StorageConverter c = new
> > >> > Utf8StorageConverter(); ???
> > >> >
> > >> >
> > >> > An ideal process would be along the lines of:
> > >> >
> > >> > DataBag d = BagFactory.getInstance().newDefaultBag();
> > >> > d.something(databag_string, schema_string);    // ??? no idea what
> > this
> > >> > process could be
> > >> > d.toString().equals(databag_string) == true.
> > >> >
> > >> > Thanks, -Dan
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Dan DeCapria
> > > CivicScience, Inc.
> > > Senior Informatics / DM / ML / BI Specialist
> > >
> >
> >
> >
> >
Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist