|
|
-
Parsing variable schema
Prashant Kommireddi 2012-12-12, 07:47
Here is a snippet of how schema is applied to tuples
String serializedSchema = p.getProperty(signature + SCHEMA_FILE); if (serializedSchema != null) { try { resourceSchema = new ResourceSchema(Utils.getSchemaFromString(serializedSchema)); } catch (ParserException e) { mLog.error("Unable to parse serialized schema " + serializedSchema, e); } } Is there a good way to define multiple "serializedSchema" which could be applied to different type of tuples (different log lines)? I am able to push this logic into a UDF to parse a record based on a schema data structure I build within it. Wondering if this can be done in LoadFunc itself.
Thanks, Prashant
-
Re: Parsing variable schema
Jonathan Coveney 2012-12-12, 18:07
I'm a little vague on what you want to do. Can you provide an example? 2012/12/11 Prashant Kommireddi <[EMAIL PROTECTED]>
> Here is a snippet of how schema is applied to tuples > > String serializedSchema = p.getProperty(signature + SCHEMA_FILE); > if (serializedSchema != null) { > try { > resourceSchema = new > ResourceSchema(Utils.getSchemaFromString(serializedSchema)); > } catch (ParserException e) { > mLog.error("Unable to parse serialized schema " + > serializedSchema, e); > } > } > > > Is there a good way to define multiple "serializedSchema" which could be > applied to different type of tuples (different log lines)? I am able to > push this logic into a UDF to parse a record based on a schema data > structure I build within it. Wondering if this can be done in LoadFunc > itself. > > Thanks, > Prashant >
-
Re: Parsing variable schema
Prashant Kommireddi 2012-12-13, 07:51
Let's say we have semi-structured logs in which the first column is always the LogType (could be A, B, C, xyz...)
A,20120101,Ax8221Za,1233122 B,Ux231asd,20120101,
Each LogType has its own schema - you would notice date appears at index 1 for LogType=A and at index 2 for LogType=B.
My question is whether there is a good way to deal with variable schema in LoadFunc?
-Prashant On Wed, Dec 12, 2012 at 11:37 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
> I'm a little vague on what you want to do. Can you provide an example? > > > 2012/12/11 Prashant Kommireddi <[EMAIL PROTECTED]> > > > Here is a snippet of how schema is applied to tuples > > > > String serializedSchema = p.getProperty(signature + SCHEMA_FILE); > > if (serializedSchema != null) { > > try { > > resourceSchema = new > > ResourceSchema(Utils.getSchemaFromString(serializedSchema)); > > } catch (ParserException e) { > > mLog.error("Unable to parse serialized schema " + > > serializedSchema, e); > > } > > } > > > > > > Is there a good way to define multiple "serializedSchema" which could be > > applied to different type of tuples (different log lines)? I am able to > > push this logic into a UDF to parse a record based on a schema data > > structure I build within it. Wondering if this can be done in LoadFunc > > itself. > > > > Thanks, > > Prashant > > >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext