Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Table schema size limit to 4000 chars ?


Copy link to this message
-
Table schema size limit to 4000 chars ?
Hi,  

I have an avro table with a schema that is around 8000 chars, and cannot query from it:

First i had issue when creating the table, Hive will throw an exception because the field in MySQL (varchar(4000)) is too small. So i altered the column to varchar(10000) and it fixed this part.

But when querying the table, Hive throws an exception that the JsonParser can not find the end of the avro schema array. It is basically the same issue as above, the avro schema string is too long to be parsed by the 3rd party Json parser org.codehaus.jackson.JsonParser in Hive/Avro. There i do not really know if this parser cannot parse arbitrary length json strings or it has an hardcoded allocated string size

Note i am using Cloudera Hive 0.9, which has avro serde bundled  

Here is the thrown exception. org.codehaus.jackson.JsonParser is mentioned at the end

(…)
12/12/17 10:49:55 WARN avro.AvroSerdeUtils: Encountered exception determining schema. Returning signal schema to indicate problem
org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@a750bb9; line: 1, column: 37])
 at [Source: java.io.StringReader@a750bb9; line: 1, column: 13980]
at org.apache.avro.Schema$Parser.parse(Schema.java:983)
at org.apache.avro.Schema$Parser.parse(Schema.java:971)
at org.apache.avro.Schema.parse(Schema.java:1020)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:61)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87)
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:831)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:959)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7532)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:246)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:94)
at org.apache.hive.service.cli.session.Session.executeStatement(Session.java:141)
at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:120)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:169)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1107)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1096)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@a750bb9; line: 1, column: 37])
 at [Source: java.io.StringReader@a750bb9; line: 1, column: 13980]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:318)
at org.codehaus.jackson.impl.JsonParserBase._handleEOF(JsonParserBase.java:354)
at org.codehaus.jackson.impl.ReaderBasedParser._skipWSOrEnd(ReaderBasedParser.java:955)
at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:247)
at org.codehaus.jackson.map.deser.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:200)
at org.codehaus.jackson.map.deser.BaseNodeDeserializer.deserializeAny(JsonNodeDeserializer.java:216)
at org.codehaus.jackson.map.deser.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:187)
at org.codehaus.jackson.map.deser.BaseNodeDeserializer.deserializeAny(JsonNodeDeserializer.java:213)
at org.codehaus.jackson.map.deser.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:56)
at org.codehaus.jackson.map.deser.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:13)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2383)
at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1234)
at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1209)
at org.apache.avro.Schema$Parser.parse(Schema.java:981)
... 30 more

Alexandre Fouche