Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - String Representation of DataBag and its Schema


+
Dan DeCapria, CivicScienc... 2013-03-18, 20:18
+
Jonathan Coveney 2013-03-18, 22:31
+
Dan DeCapria, CivicScienc... 2013-03-19, 13:37
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:16
+
Jonathan Coveney 2013-03-19, 15:27
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:37
+
Jonathan Coveney 2013-03-19, 15:43
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:52
+
Jonathan Coveney 2013-03-19, 16:08
+
Dan DeCapria, CivicScienc... 2013-03-19, 16:40
+
Jonathan Coveney 2013-03-19, 16:53
+
Jonathan Coveney 2013-03-19, 16:54
+
Dan DeCapria, CivicScienc... 2013-03-19, 17:20
+
William Oberman 2013-03-21, 15:51
Copy link to this message
-
Re: String Representation of DataBag and its Schema
Dan DeCapria, CivicScienc... 2013-03-19, 15:43
Such that this string_input matches the Schema:

        String string_databag = "{(apples,(banana,1024),2048)}";
        String string_schema "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
        Schema schema = Utils.getSchemaFromString(string_schema);
        LogicalSchema logical_schema = Utils.parseSchema(string_schema);
        ResourceSchema rschema = new ResourceSchema(schema);

-Dan

On Tue, Mar 19, 2013 at 11:37 AM, Dan DeCapria, CivicScience <
[EMAIL PROTECTED]> wrote:

> String string_databag in this example was typed out by me, as the input
> String for a JUnit test method. I am considering generating many of these
> for case specific unit testing of my UDFs.
>
> -Dan
>
> On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
>
>> how was string_databag generated?
>>
>>
>> 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>>
>> > Expanding upon this, the following use case's Schema Object can be
>> resolved
>> > from inputs:
>> >
>> >         String string_databag = "{(a,(b,d),f)}";
>> >         String string_schema >> > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
>> >         Schema schema = Utils.getSchemaFromString(string_schema);
>> >
>> > Next step is to resolve a DataBag Object from String string_databag and
>> the
>> > Schema Object.
>> >
>> > -Dan
>> >
>> > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> > > Thank you for your reply.
>> > >
>> > > The problem is I cannot find a methodology to go from a String
>> > > representation of a complex data type to a nested Object of pig
>> > DataTypes.
>> > > I looked over the pig 0.10.1 docs, but cannot find a way to go from
>> > String
>> > > and Schema to pig DataType Object.
>> > >
>> > > For context, I am generating these Strings for my own JUnit testing of
>> > > other UDFs.  Currently, for complex types, I have to generate each
>> > nesting
>> > > from Tuple and DataBag factories, append data, and next them manually.
>> >  For
>> > > larger unit tests, this process becomes unwieldy (hundreds of lines
>> per
>> > > method, non-dynamic), and it would be much simpler to go directly
>> from a
>> > > String and a Schema to a DataBag Object for UDF testing (few lines of
>> > code,
>> > > easily modifiable).
>> > >
>> > > -Dan
>> > >
>> > >
>> > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <[EMAIL PROTECTED]
>> > >wrote:
>> > >
>> > >> Why not just use PigStorage? This is essentially what it does. It
>> saves
>> > a
>> > >> bag as text, and then loads it again.
>> > >>
>> > >> I suppose the question becomes: why do you need to do this?
>> > >>
>> > >>
>> > >> 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>> > >>
>> > >> > In Java, I am trying to convert a DataBag from it's String
>> > >> representation
>> > >> > with its schema String to a valid DataBag Object:
>> > >> >
>> > >> > String databag_string = "{(apples,1024)}";
>> > >> > String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}";
>> > >> >
>> > >> > I've tried implementing something along the lines of this, but I
>> > believe
>> > >> > it's in the wrong direction, and then I get stuck:
>> > >> >
>> > >> >         String[] aliases = {"b1", "t1", "a", "b"};
>> > >> >         byte[] types = {DataType.BAG, DataType.TUPLE,
>> > >> DataType.CHARARRAY,
>> > >> > DataType.LONG};
>> > >> >         List<Schema.FieldSchema> fsList = new
>> > >> > ArrayList<Schema.FieldSchema>();
>> > >> >         for (int i = 0; i < aliases.length; i++) {
>> > >> >             fsList.add(new Schema.FieldSchema(aliases[i],
>> types[i])) ;
>> > >> >         }
>> > >> >         Schema origSchema = new Schema(fsList);
>> > >> >         ResourceSchema rsSchema = new ResourceSchema(origSchema);
>> > >> >         Schema genSchema = Schema.getPigSchema(rsSchema);
>> > >> >         ResourceSchema.ResourceFieldSchema[] rfschema >> > >> > rsSchema.getFields();

Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist