Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Table schema size limit to 4000 chars ?


Copy link to this message
-
Re: Table schema size limit to 4000 chars ?
Ah, it seems the Json parser issue was due to my avro schema having comments //. I have seen some comments on the web about this parser that it can be configured to accept comments.  

Is there a Hive property to be passed to json parser and allow comments in Avro schemas ?  

--
Alexandre Fouche

On Monday 17 December 2012 at 14:24, Alexandre Fouche wrote:

> Hi,  
>  
> I have an avro table with a schema that is around 8000 chars, and cannot query from it:
>  
> First i had issue when creating the table, Hive will throw an exception because the field in MySQL (varchar(4000)) is too small. So i altered the column to varchar(10000) and it fixed this part.
>  
> But when querying the table, Hive throws an exception that the JsonParser can not find the end of the avro schema array. It is basically the same issue as above, the avro schema string is too long to be parsed by the 3rd party Json parser org.codehaus.jackson.JsonParser in Hive/Avro. There i do not really know if this parser cannot parse arbitrary length json strings or it has an hardcoded allocated string size
>  
> Note i am using Cloudera Hive 0.9, which has avro serde bundled  
>  
> Here is the thrown exception. org.codehaus.jackson.JsonParser is mentioned at the end
>  
> (…)
> 12/12/17 10:49:55 WARN avro.AvroSerdeUtils: Encountered exception determining schema. Returning signal schema to indicate problem
> org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@a750bb9; line: 1, column: 37])
>  at [Source: java.io.StringReader@a750bb9; line: 1, column: 13980]
> at org.apache.avro.Schema$Parser.parse(Schema.java:983)
> at org.apache.avro.Schema$Parser.parse(Schema.java:971)
> at org.apache.avro.Schema.parse(Schema.java:1020)
> at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:61)
> at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87)
> at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59)
> at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203)
> at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
> at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
> at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
> at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:831)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:959)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7532)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:246)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
> at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:94)
> at org.apache.hive.service.cli.session.Session.executeStatement(Session.java:141)
> at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:120)
> at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:169)
> at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1107)
> at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1096)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)