Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Parsing variable schema


Copy link to this message
-
Parsing variable schema
Here is a snippet of how schema is applied to tuples

String serializedSchema = p.getProperty(signature + SCHEMA_FILE);
                if (serializedSchema != null) {
                    try {
                        resourceSchema = new
ResourceSchema(Utils.getSchemaFromString(serializedSchema));
                    } catch (ParserException e) {
                        mLog.error("Unable to parse serialized schema " +
serializedSchema, e);
                    }
                }
Is there a good way to define multiple "serializedSchema" which could be
applied to different type of tuples (different log lines)? I am able to
push this logic into a UDF to parse a record based on a schema data
structure I build within it. Wondering if this can be done in LoadFunc
itself.

Thanks,
Prashant
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB