Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - String Representation of DataBag and its Schema


Copy link to this message
-
String Representation of DataBag and its Schema
Dan DeCapria, CivicScienc... 2013-03-18, 20:18
In Java, I am trying to convert a DataBag from it's String representation
with its schema String to a valid DataBag Object:

String databag_string = "{(apples,1024)}";
String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}";

I've tried implementing something along the lines of this, but I believe
it's in the wrong direction, and then I get stuck:

        String[] aliases = {"b1", "t1", "a", "b"};
        byte[] types = {DataType.BAG, DataType.TUPLE, DataType.CHARARRAY,
DataType.LONG};
        List<Schema.FieldSchema> fsList = new
ArrayList<Schema.FieldSchema>();
        for (int i = 0; i < aliases.length; i++) {
            fsList.add(new Schema.FieldSchema(aliases[i], types[i])) ;
        }
        Schema origSchema = new Schema(fsList);
        ResourceSchema rsSchema = new ResourceSchema(origSchema);
        Schema genSchema = Schema.getPigSchema(rsSchema);
        ResourceSchema.ResourceFieldSchema[] rfschema rsSchema.getFields();
        ... lost here, maybe Utf8StorageConverter c = new
Utf8StorageConverter(); ???
An ideal process would be along the lines of:

DataBag d = BagFactory.getInstance().newDefaultBag();
d.something(databag_string, schema_string);    // ??? no idea what this
process could be
d.toString().equals(databag_string) == true.

Thanks, -Dan