|
|
-
Table schema size limit to 4000 chars ?
Alexandre Fouche 2012-12-17, 13:24
Hi,
I have an avro table with a schema that is around 8000 chars, and cannot query from it:
First i had issue when creating the table, Hive will throw an exception because the field in MySQL (varchar(4000)) is too small. So i altered the column to varchar(10000) and it fixed this part.
But when querying the table, Hive throws an exception that the JsonParser can not find the end of the avro schema array. It is basically the same issue as above, the avro schema string is too long to be parsed by the 3rd party Json parser org.codehaus.jackson.JsonParser in Hive/Avro. There i do not really know if this parser cannot parse arbitrary length json strings or it has an hardcoded allocated string size
Note i am using Cloudera Hive 0.9, which has avro serde bundled
Here is the thrown exception. org.codehaus.jackson.JsonParser is mentioned at the end
(…) 12/12/17 10:49:55 WARN avro.AvroSerdeUtils: Encountered exception determining schema. Returning signal schema to indicate problem org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@a750bb9; line: 1, column: 37]) at [Source: java.io.StringReader@a750bb9; line: 1, column: 13980] at org.apache.avro.Schema$Parser.parse(Schema.java:983) at org.apache.avro.Schema$Parser.parse(Schema.java:971) at org.apache.avro.Schema.parse(Schema.java:1020) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:61) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:831) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:959) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7532) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:246) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:94) at org.apache.hive.service.cli.session.Session.executeStatement(Session.java:141) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:120) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:169) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1107) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1096) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@a750bb9; line: 1, column: 37]) at [Source: java.io.StringReader@a750bb9; line: 1, column: 13980] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291) at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385) at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:318) at org.codehaus.jackson.impl.JsonParserBase._handleEOF(JsonParserBase.java:354) at org.codehaus.jackson.impl.ReaderBasedParser._skipWSOrEnd(ReaderBasedParser.java:955) at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:247) at org.codehaus.jackson.map.deser.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:200) at org.codehaus.jackson.map.deser.BaseNodeDeserializer.deserializeAny(JsonNodeDeserializer.java:216) at org.codehaus.jackson.map.deser.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:187) at org.codehaus.jackson.map.deser.BaseNodeDeserializer.deserializeAny(JsonNodeDeserializer.java:213) at org.codehaus.jackson.map.deser.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:56) at org.codehaus.jackson.map.deser.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:13) at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2383) at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1234) at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1209) at org.apache.avro.Schema$Parser.parse(Schema.java:981) ... 30 more
Alexandre Fouche
-
Re: Table schema size limit to 4000 chars ?
Alexandre Fouche 2012-12-17, 13:58
Ah, it seems the Json parser issue was due to my avro schema having comments //. I have seen some comments on the web about this parser that it can be configured to accept comments.
Is there a Hive property to be passed to json parser and allow comments in Avro schemas ?
-- Alexandre Fouche
On Monday 17 December 2012 at 14:24, Alexandre Fouche wrote:
> Hi, > > I have an avro table with a schema that is around 8000 chars, and cannot query from it: > > First i had issue when creating the table, Hive will throw an exception because the field in MySQL (varchar(4000)) is too small. So i altered the column to varchar(10000) and it fixed this part. > > But when querying the table, Hive throws an exception that the JsonParser can not find the end of the avro schema array. It is basically the same issue as above, the avro schema string is too long to be parsed by the 3rd party Json parser org.codehaus.jackson.JsonParser in Hive/Avro. There i do not really know if this parser cannot parse arbitrary length json strings or it has an hardcoded allocated string size > > Note i am using Cloudera Hive 0.9, which has avro serde bundled > > Here is the thrown exception. org.codehaus.jackson.JsonParser is mentioned at the end > > (…) > 12/12/17 10:49:55 WARN avro.AvroSerdeUtils: Encountered exception determining schema. Returning signal schema to indicate problem > org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@a750bb9; line: 1, column: 37]) > at [Source: java.io.StringReader@a750bb9; line: 1, column: 13980] > at org.apache.avro.Schema$Parser.parse(Schema.java:983) > at org.apache.avro.Schema$Parser.parse(Schema.java:971) > at org.apache.avro.Schema.parse(Schema.java:1020) > at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:61) > at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87) > at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59) > at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203) > at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) > at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) > at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) > at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:831) > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:959) > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7532) > at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:246) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906) > at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:94) > at org.apache.hive.service.cli.session.Session.executeStatement(Session.java:141) > at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:120) > at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:169) > at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1107) > at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1096) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) > at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
-
Re: Table schema size limit to 4000 chars ?
Jakob Homan 2013-01-22, 22:20
There shouldn't be any problems with comments in Avro schemas. You just need to make sure they're escaped properly. We did run into a problem with schema.literal values longer than 4k (the size of the backing mysql varchar field), so internally we just bump this value for our Hive installs:
ALTER TABLE SERDE_PARAMS MODIFY PARAM_VALUE varchar(20000); On 17 December 2012 05:58, Alexandre Fouche <[EMAIL PROTECTED]> wrote: > Ah, it seems the Json parser issue was due to my avro schema having comments > //. I have seen some comments on the web about this parser that it can be > configured to accept comments. > > Is there a Hive property to be passed to json parser and allow comments in > Avro schemas ? > > -- > Alexandre Fouche > > On Monday 17 December 2012 at 14:24, Alexandre Fouche wrote: > > Hi, > > I have an avro table with a schema that is around 8000 chars, and cannot > query from it: > > First i had issue when creating the table, Hive will throw an exception > because the field in MySQL (varchar(4000)) is too small. So i altered the > column to varchar(10000) and it fixed this part. > > But when querying the table, Hive throws an exception that the JsonParser > can not find the end of the avro schema array. It is basically the same > issue as above, the avro schema string is too long to be parsed by the 3rd > party Json parser org.codehaus.jackson.JsonParser in Hive/Avro. There i do > not really know if this parser cannot parse arbitrary length json strings or > it has an hardcoded allocated string size > > Note i am using Cloudera Hive 0.9, which has avro serde bundled > > Here is the thrown exception. org.codehaus.jackson.JsonParser is mentioned > at the end > > (…) > 12/12/17 10:49:55 WARN avro.AvroSerdeUtils: Encountered exception > determining schema. Returning signal schema to indicate problem > org.apache.avro.SchemaParseException: > org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected > close marker for ARRAY (from [Source: java.io.StringReader@a750bb9; line: 1, > column: 37]) > at [Source: java.io.StringReader@a750bb9; line: 1, column: 13980] > at org.apache.avro.Schema$Parser.parse(Schema.java:983) > at org.apache.avro.Schema$Parser.parse(Schema.java:971) > at org.apache.avro.Schema.parse(Schema.java:1020) > at > org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:61) > at > org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87) > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203) > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) > at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) > at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) > at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:831) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:959) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7532) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:246) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906) > at > org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:94) > at > org.apache.hive.service.cli.session.Session.executeStatement(Session.java:141) > at > org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:120) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:169)
|
|