Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive 0.11.0 | Issue with ORC Tables


Copy link to this message
-
Re: Hive 0.11.0 | Issue with ORC Tables
On Thu, Sep 19, 2013 at 5:04 AM, Savant, Keshav <
[EMAIL PROTECTED]> wrote:

>  Hi All,****
>
> ** **
>
> We have setup apache “hive 0.11.0” services on Hadoop cluster (apache
> version 0.20.203.0). Hive is showing expected results when tables are
> stored as * TextFile*. ****
>
> Whereas, Hive 0.11.0’s new feature ORC(*Optimized Row Columnar*) is
> throwing an exception while running a select query, when we run select
> queries on tables stored as “*ORC*”.****
>
> Stacktrace of the exception :****
>
> ** **
>
> 2013-09-19 20:33:38,095 ERROR CliDriver
> (SessionState.java:printError(386)) - Failed with exception
> java.io.IOException:com.google.protobuf.InvalidProtocolBufferException:
> While parsing a protocol message, the input ended unexpectedly in the
> middle of a field.  This could mean either than the input has been
> truncated or that an embedded message misreported its own length.****
>
> java.io.IOException: com.google.protobuf.InvalidProtocolBufferException:
> While parsing a protocol message, the input ended unexpectedly in the
> middle of a field.  This could mean either than the input has been
> truncated or that an embedded message misreported its own length.****
>
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
> ****
>
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
> ****
>
>         at
> org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)****
>
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)**
> **
>
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)**
> **
>
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)****
>
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)****
>
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)***
> *
>
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)**
> **
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)****
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> ****
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> ****
>
>         at java.lang.reflect.Method.invoke(Method.java:597)****
>
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)****
>
> Caused by: com.google.protobuf.InvalidProtocolBufferException: While
> parsing a protocol message, the input ended unexpectedly in the middle of a
> field.  This could mean either than the input has been truncated or that an
> embedded message misreported its own length.****
>
>         at
> com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:49)
> ****
>
>         at
> com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:754)
> ****
>
>         at
> com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:294)*
> ***
>
>         at
> com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:484)
> ****
>
>         at
> com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438)
> ****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:10129)
> ****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:9993)
> ****
>
>         at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300)
> ****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFrom(OrcProto.java:9970)
> ****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:193)***
> *
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:56)****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:168)
The problem is that "load data" doesn't convert the file into ORC format.
You need to use the following commands:

CREATE TABLE person_staging (id INT, name STRING);

LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE person_staging;

SELECT * FROM person_staging;

INSERT OVERWRITE TABLE person select * from person_staging;

SELECT * FROM person;

Sorry for the bad error message. I improved the ORC reader to explicitly
check that the file is actually an ORC file in
https://issues.apache.org/jira/browse/HIVE-4724 .

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB