Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> String Representation of DataBag and its Schema


+
Dan DeCapria, CivicScienc... 2013-03-18, 20:18
+
Jonathan Coveney 2013-03-18, 22:31
+
Dan DeCapria, CivicScienc... 2013-03-19, 13:37
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:16
+
Jonathan Coveney 2013-03-19, 15:27
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:37
+
Jonathan Coveney 2013-03-19, 15:43
+
Dan DeCapria, CivicScienc... 2013-03-19, 15:52
+
Jonathan Coveney 2013-03-19, 16:08
+
Dan DeCapria, CivicScienc... 2013-03-19, 16:40
+
Jonathan Coveney 2013-03-19, 16:53
+
Jonathan Coveney 2013-03-19, 16:54
+
Dan DeCapria, CivicScienc... 2013-03-19, 17:20
+
William Oberman 2013-03-21, 15:51
Copy link to this message
-
Re: String Representation of DataBag and its Schema
Such that this string_input matches the Schema:

        String string_databag = "{(apples,(banana,1024),2048)}";
        String string_schema "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
        Schema schema = Utils.getSchemaFromString(string_schema);
        LogicalSchema logical_schema = Utils.parseSchema(string_schema);
        ResourceSchema rschema = new ResourceSchema(schema);

-Dan

On Tue, Mar 19, 2013 at 11:37 AM, Dan DeCapria, CivicScience <
[EMAIL PROTECTED]> wrote:

> String string_databag in this example was typed out by me, as the input
> String for a JUnit test method. I am considering generating many of these
> for case specific unit testing of my UDFs.
>
> -Dan
>
> On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
>
>> how was string_databag generated?
>>
>>
>> 2013/3/19 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>>
>> > Expanding upon this, the following use case's Schema Object can be
>> resolved
>> > from inputs:
>> >
>> >         String string_databag = "{(a,(b,d),f)}";
>> >         String string_schema >> > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
>> >         Schema schema = Utils.getSchemaFromString(string_schema);
>> >
>> > Next step is to resolve a DataBag Object from String string_databag and
>> the
>> > Schema Object.
>> >
>> > -Dan
>> >
>> > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> > > Thank you for your reply.
>> > >
>> > > The problem is I cannot find a methodology to go from a String
>> > > representation of a complex data type to a nested Object of pig
>> > DataTypes.
>> > > I looked over the pig 0.10.1 docs, but cannot find a way to go from
>> > String
>> > > and Schema to pig DataType Object.
>> > >
>> > > For context, I am generating these Strings for my own JUnit testing of
>> > > other UDFs.  Currently, for complex types, I have to generate each
>> > nesting
>> > > from Tuple and DataBag factories, append data, and next them manually.
>> >  For
>> > > larger unit tests, this process becomes unwieldy (hundreds of lines
>> per
>> > > method, non-dynamic), and it would be much simpler to go directly
>> from a
>> > > String and a Schema to a DataBag Object for UDF testing (few lines of
>> > code,
>> > > easily modifiable).
>> > >
>> > > -Dan
>> > >
>> > >
>> > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <[EMAIL PROTECTED]
>> > >wrote:
>> > >
>> > >> Why not just use PigStorage? This is essentially what it does. It
>> saves
>> > a
>> > >> bag as text, and then loads it again.
>> > >>
>> > >> I suppose the question becomes: why do you need to do this?
>> > >>
>> > >>
>> > >> 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>> > >>
>> > >> > In Java, I am trying to convert a DataBag from it's String
>> > >> representation
>> > >> > with its schema String to a valid DataBag Object:
>> > >> >
>> > >> > String databag_string = "{(apples,1024)}";
>> > >> > String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}";
>> > >> >
>> > >> > I've tried implementing something along the lines of this, but I
>> > believe
>> > >> > it's in the wrong direction, and then I get stuck:
>> > >> >
>> > >> >         String[] aliases = {"b1", "t1", "a", "b"};
>> > >> >         byte[] types = {DataType.BAG, DataType.TUPLE,
>> > >> DataType.CHARARRAY,
>> > >> > DataType.LONG};
>> > >> >         List<Schema.FieldSchema> fsList = new
>> > >> > ArrayList<Schema.FieldSchema>();
>> > >> >         for (int i = 0; i < aliases.length; i++) {
>> > >> >             fsList.add(new Schema.FieldSchema(aliases[i],
>> types[i])) ;
>> > >> >         }
>> > >> >         Schema origSchema = new Schema(fsList);
>> > >> >         ResourceSchema rsSchema = new ResourceSchema(origSchema);
>> > >> >         Schema genSchema = Schema.getPigSchema(rsSchema);
>> > >> >         ResourceSchema.ResourceFieldSchema[] rfschema >> > >> > rsSchema.getFields();

Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB